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OPTIMAL REFINEMENT OF THE RATING 
SCALE* 


HORACE CHAMPNEY anp HELEN MARSHALL 


E rating scale, as an instrument for serious research, 
has been viewed with considerable distrust, if not dis- 
dain, by the psychometrician. Where, as a last resort, 

the technique has been used, refinement beyond five- or seven- 
point scales has seldom been attempted. 

This attitude seems to stem in part from a much quoted 
study by P. M. Symonds (1924). Starting with the Shep- 
pard-Kelley formula (Kelley, 1924, p. 168) for correcting a 
correlation coefficient for grouping, Symonds derives a func- 
tion giving the optimal number of scale divisions for various 
reliabilities. Reading from his table it is found, for instance, 
that a seven-point rating scale which yields a reliability co- 
efficient of .60 is optimally refined. This is because perfect 
refinement would raise the correlation only to .627, and the 
loss of .027 corresponds to a difference in the alienation co- 
efficient of .0213, a value which Symonds adopts as a maximum 
tolerable loss. He further states that under good average con- 
ditions .55 is about the average reliability of personality rat- 
ings, and that ‘‘in the rating of human traits one cannot ex- 
pect even under the best conditions reliabilities of over .60 or 
.70. In such ratings there is no object in having scales of more 
than 7, 8, or 9 classes.’’ 

In the current work with rating scales at the Fels Research 
Institute we are getting results which seem to question the 
validity of Symonds’ conclusions. Our data, when treated 
as an empirical check, fail to conform to the Symonds function. 
To obtain correlations within the tolerable loss set by Symonds 
we often find it necessary to use more classes than he indi- 
cates—occasionally as much as three times as many. Both 


* From the Samuel 8. Fels Research Institute, Antioch College, Yellow 
Springs, O. 
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empirical and logical considerations seem to suggest that the 
Sheppard-Kelley formula is an erroneous point of departure, 
and that several significant aspects of the problem have not 
been taken into account. The issues seem to be rather in- 
volved, and we have not attempted a complete solution, but 
merely present some data and theoretical considerations for 
what they may be worth. 

The scales used have to do with parent behavior, as observed 
by a full-time home visitor. They are described in detail by 
Champney (1939). For the present discussion it is impor- 
tant to note merely that the scale is graphic, and that by 
properly placing a millimeter stick along the 9-centimeter rat- 
ing line, and reading the position of the ‘‘X’’ to the nearest 
millimeter, scores will be obtained ranging from 10 to 99. 
Such a score is referred to as a ‘“‘millimeter score.’’ By sim- 
ply dropping off the second digit we have scores ranging from 
1 to 9, or “‘centimeter scores.’’ In calculating standard de- 
viations and correlations it is clear that these scores may be 
treated respectively as ungrouped data, and as data grouped 
into classes of ten-millimeter intervals. 

The first eight scales put into service included two alter- 
native forms on Sociability of the Family. When 25 families 
had been rated we found a correlation between Form A and 
Form B, using the coarse, centimeter scores, of .60. Accord- 
ing to Symonds, that was about what we might have expected ; 
further, the true correlation should be about .63; and a seven- 
point scale was justified. As a matter of fact, the nine-point 
scale apparently provided by the centimeter scoring was 
actually functioning only as a seven-point scale with two 
‘unused points at one end—6 standard deviations giving a 
range of only about 7 centimeters. 

But when we correlated the ‘‘over-refined,’’ millimeter 
scores, instead of the expected .63, we found an r of .76—an 
increase more than five times as great as the increase pre- 
dicted. If the Sheppard-Kelley function were augmented to 
predict a difference of this size (a dubious procedure), Sy- 
monds’ treatment would call for a scale with about 22 classes. 
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The next 25 families were correlated, and the next 30, with 
the results—somewhat less startling—shown in Table 1. If 


TABLE 1 
Home Visit Ratings: Sociability of Family, Forms A and B 





Number Tap 








ee Average 
mm. em. sigma 

families units units 

25 -762 602 11.6 mm, 

25 823 715 9.3 

30 .697 .642 7.5 

Combined : 

80 .766 .665 9.7 mm. 

Or + .047 + .062 


Critical ratio of difference 2.8, 





the combined centimeter r of .665 is corrected by the Shep- 
pard-Kelley formula, we get .725, as compared with an actual 
millimeter r of .766—the formula accounts for only about half 
the actual difference. 

To study further the effect of refinement of scoring upon 
obtained correlation we tried some intermediate class intervals. 
The results, based on the same 80 pairs of ratings, are plotted 
in Figure 1. The class intervals used are indicated along the 
top. Along the bottom is the basic abscissa scale, in terms of 
the resulting number of scale divisions in a 6-sigma range. 
The ordinates are plotted in terms of r*, on the assumption 
that this provides a closer approximation to equivalent units 
than does r. Corresponding values of r, however, are given 
on the right. 

Since the arbitrary location of the class boundaries em- 
ployed in grouping the data introduces considerable error in 
the resulting correlation coefficient, we calculated, for each 
size of interval, a number of correlations each using different 
class boundary locations. These r*’s are indicated by the 
circles, and their respective means by the black dots. This 
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gives us a curve for increasing correlation with increasing re- 
finement of scaling, gradually leveling off to a peak at the point 
‘*p,’’ and then falling very slightly as the 2 mm. class interval is 
refined to 1 mm. 

Since Symonds gives the optimal number of scale divisions 
for various obtained correlations, we can plot his function on 
the same graph, obtaining the line AB. Now, Symonds’ curve 


CLASS INTERVAL IN MM UNITS ¢ 





TL. 














30 
0 20 2» 40 so he 
NUMBER OF SCALE DIVISIONS n= 6C~c 


crosses the empirical curve at the point ‘‘x.’’ This is the 
point at which our obtained value corresponds to Symonds’ 
theoretical optimum, r reading .73, and n, 9 scale divisions. 
Symonds predicts that with increasing refinement the curve 
will level off at the height ‘‘q,’’ rather than rise about twice 
as far to ‘‘p.’’ 

Symonds’ treatment does not provide for any drop in the 
curve beyond the peak, but that such a drop is to be expected 
ean readily be demonstrated. After adding to each eenti- 
meter score a second digit drawn at random, it was found that 
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the resulting correlation fell off to the point ‘‘w.’’ This was, 
in effect, testing what happens to r when refinement of scoring 
is increased beyond any significant discrimination on the part 
of the rater. 

We can now state the following hypothesis: Adde’ refine- 
ment of scoring increases the reliability up to the point where 
the added significant discrimination is balanced by added 
random error. Beyond this point reliability falls off slightly. 
With a graphic scale this point (‘‘p’’) can be determined 
empirically, and represents the theoretical optimum degree of 
refinement called for by the fineness of the discriminations 
involved. 

If for practical purposes it is found that the resulting 
number of scale divisions is cumbersome, any arbitrary criter- 
ion of maximal acceptable loss in reliability may be applied to 
the empirical curve, and the appropriate refinement read off 


TABLE 2 


Correlation of Home Visit Ratings with Reratings 3 Weeks Later 
by Sar Pater. 20 Cases 











ad oe 
Scale mm. em. in : 
units units Fam. “ass Trem. 
Adjustment of Home .......................... ~ 8 634 +.19 
Activeness of Home .0000..occcccccne 814 .672 +.21 
Sociability of Family (B) ................. .626 568 +.07 
Child-Centeredness of Home .......... .622 .530 +.11 
Duration of Contact by Mother... .736 637 +.13 
Intensity of Contact ape .787 .755 + .05 
Suggestion RS ie .788 + .08 
Babying eee .840 757 +.14 
Understanding akon 836 -798 + .06 
Emotionality werner 774 + .08 
Affectionateness eee 402 +.24 
Sociability of Family (A) ............... 643 .630 + .02 
Average* -752 .672 +.114 








* Average r=root mean square; average difference in r*= M. 
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on the base line. Symonds’ criterion, applied to our data, 
would accept a drop in r from .77 to .75 (or from ‘‘p’’ to the 
line ‘‘yq’’), and call for a twelve-point scale. If a drop in 
r® of .01 is taken as the criterion, we find we need a scale of 
22 points. It might be argued that where adequate reliability 
is as difficult to attain as is the case with rating scales it 
should be conserved as much as practicable rather than 
squandered for small cause. 

To answer partially the question of sampling error, we have 
compared millimeter and centimeter correlations on a number 
of different scales and with different kinds of data. 

Table 2 shows such a comparison based on reratings for 12 
different scales, totaling 240 cases. The centimeter scores are 














TABLE 3 
Summary—Correlations, Millimeter vs. Centimeter Units 
Average* r Diderence 
mm. cm. a 
units units Mr2uq_—Mr2,.,, 
Home visit rating with rerating after 
3 weeks. 12 scales. 20 cases .............. 752 8.672 +.114 
Home visit rating with case history rat- 
ing by 2nd rater. 10 scales. 34 cases .586 .560 + .031 
Nine staff members rating 20 homes on 
basis of general acquaintance. Each 
rater correlated with pool of 3-7 other 
raters. 7 scales 629 .609 + .024 
Home visit ratings with above staff pool. 
14 cases. 8 scales 685 .623 + .081 





Home visitor rating on basis of general 
acquaintance with rerating same day. 
3 scales. 10 cases 896 .782 + .190 


Average ratings (pools of 1-8 raters, 
av. 4.5) with similar reratings. 20 











eases. 3 scales 898 .826 +.124 
Two forms, ‘‘Sociability of Family.’’ 

80 cases 766 §.665 +.143 
Above two forms, but with 2nd digit of 

mm. scores drawn at random. 80cases .620 .665 ~ 058 





* Root mean square. 
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seen to show an average loss in r* of .114—just about twice 
the amount accounted for by the Sheppard-Kelley formula. 

Table 3 gives similar figures for various kinds of inter- 
correlations totaling 1322 cases. The last item is inserted 
for comparison ; it g'\s the change in r found when the milli- 
meter scores represent no additional discriminatior over the 
centimeter scores, but merely random iner«m ents. 

To summarize, a preliminary analysis of the effects of group- 
ing (or of coarsening the scale) may be outlined as follows: 

1. Since for any given size of class inter~al there are several 
possible arrangements of class boundaries, and since each such 
arrangement will yield a somewhat different correlation co- 
efficient, grouping tends to increase the random error of the 
correlation, adding to the ordinary sampling error an error of 
boundary sampling. The dispersions of the circles in Figure 
1 suggest that for coarse scaling this effect is -onsiderable. It 
involves no constant error, however, and is mentioned here 
only because it seems to imply a source of unreliability in 
grouped data and coarse scales which has been rather neglected 
in the usual treatment of the problem. 

2. Grouping tends to increase the standard deviation of the 
distribution. This phenomenon is commonly explained as due 
to the fact that the mid-point of each class is taken to repre- 
sent the scores in the class whereas in reality their distribu- 
tion is skewed and their mean is nearer the mean of the entire 
distribution. This tends to increase the value of the devia- 
tions, introducing a constant error which may be counteracted 
by using Sheppard’s correction. Kelley and Symonds, how- 
ever, go further and assume that the correction should also 
be applied to correlation, since otherwise the augmented 
standard deviations make the coefficient too low. 

On the other hand, our data does not seem to fit this as- 
sumption. We suggest that perhaps the phenomenon treated 
by Sheppard’s correction is not the heart of the refinement 
problem, since it effects the numerator term as well as the 
denominator, with the result that it introduces little or no 
constant effect in the reliability coefficient. 








330 HORACE CHAMPNEY AND HELEN MARSHALL 


3. A more promising rationale for the problem would seem 
to call for a consideration of what sort of variance is removed 
by coarsening the scale. The coarsening or grouping process 
may be thought of as analogous to striking off decimal places 
from the scores. Two sorts of effects may be distinguished: 

(a) If the digits removed are purely random, insignificant, 
uncorrelated with the variable being measured, then the effect 
is to decrease the variance. The digit which varies at random 
is, in other words, replaced by a constant (zero), reducing 
the variance but not the covariance, and hence increasing 
the correlation. This is essentially what happened in Table 
3 when the artificially randomized second digits represented 
in the last item were removed and the correlation rose from 
.62 to .66. 

(b) If, however, the removed digits are significant, i.e., 
correlated with the variable being measured, then their re- 
moval also reduces the covariance, and hence tends to reduce 
the correlation. This, we believe, is the major constant er’ r 
involved in grouping, and largely accounts for the shape of the 
curve suggested in Figure 1. 

Since the two effects, (a) and (b), tend to cancel each other, 
and since, as the scale is scored with greater and greater 
refinement, the added variance becomes more and more purely 
random, and, at the same time, of smaller and smaller magni- 
tude, we should expect the theoretical curve of reliability to 
rise sharply and then level off gradually as the two effects 
balance each other, and then to fall slightly and level off 
asymptotically as infinite refinement is approached. 

The formula which best expresses these effects is probably 
the correlation of two sums,* considering the refined score as 
consisting of a coarse score plus an increment: 


j _ TxxOx0y + FaqSx0q + Ty9FySp + TrqFrOa 
ner Cee Vox? + 65" + BW ayx8p VOs" + Og? + BW yqFySq 
In conclusion it may be stated, pending further empirical 
and theoretical verification, that the rating scale merits more 
* See Dunlap and Kurtz (1932) formula 267. 
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refined treatment than it is usually given. The optimal refine- 
ment for any particular rating technique is a function of the 
discrimination achieved by a given degree of refinement in 
excess of the discrimination achieved by a lesser degree of 
refinement. This increment of discrimination seems to be 
sensitive to the conditions of measurement and not reducible 
to an arbitrary function of grouping such as the one proposed 
by Symonds. If this hypothesis is borne out by further in- 
vestigation it will follow that at least for research purposes 
and under favorable rating conditions the current practice 
of limiting ratings to five- or seven-point scales may often 
give inexcusably inaccurate results. Perhaps it would not be 
too conservative to suggest that the usual 18- to 24-step stand- 
ard be applied to rating scale practice unless it is shown that 
for a particular job either accuracy is not desirable or discrim- 
ination beyond seven points is not to be attained. 
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DEPENDABILITY AND CLERICAL 
APTITUDE 


GEORGE J. DUDYCHA 
Ripon College 


N business and industry increasing emphasis is being placed 
on aptitude testing and the conviction is growing among 
employers that an employee well placed, according to his 

specific abilities, pays greater dividends than one chosen at 
random by trial and error methods. As added tests are de- 
vised and standardized such placement becomes more efficient ; 
for a large battery of tests affords a better picture of the 
testee. A greater advance has been made in the measurement 
of aptitudes and specific abilities, however, than in the mea- 
surement of personality and character traits. It is true that 
such significant traits as introversion-extroversion, dominance, 
tendency toward neuroticism, and sociability have been mea- 
sured, but there are many others which are equally important. 
An employer may wish to know whether a prospective em- 
ployee has mechanical aptitude, or whether he possesses cer- 
tain skills, or specific abilities; but success on the job may not 
depend solely on these measured traits. Personality and char- 
acter traits may be just as important. Honesty, cooperative- 
ness, dependability and similar traits may be of prime impor- 
tance for success in certain positions, and yet the only knowl- 
edge an employer may have of a prospect’s character is from 
recommendations, consisting of general statements, written by 
‘well-meaning but uncritical friends. 

Recently the writer has made observations on the behavior 
of college students in two types of life-situations: punctuality 
(2, 3, 4) and dependability (5). In each of these studies the 
students had no knowledge of the fact that their behavior was 
under scrutiny and hence their behavior was not modified to 
meet the expectations of the observer. In the present paper, 
we shall be concerned with the last study, dependability, and 
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its relationship to clerical ability as measured by the Minne- 
sota Vocational Test for Clerical Workers (1). 


PROCEDURE 


Observations on dependability were necessarily made in a 
rather limited field of behavior—the consistency or inconsis- 
tency with which students returned borrowed books on or be- 
fore the due-date to the college library. The measure of their 
dependability was in terms of the average fine paic per bvok 
withdrawn during the school year. No student was included 
in the study unless he withdrew at least five books on five dif- 
ferent occasions. However, the median number of with- 
drawals made was 13.38 and the median number of books with- 
drawn was 18.45. 

A total of 244 students was included in the original study. 
These were divided into three dependability groups: very de- 
pendable, averagely dependable, and undependable. The cri- 
teria used for making the divisions were: (1) The very de- 
pendable group includes all students who made at least five 
withdrawals and were not fined once during the whole year. 
(2) The group of average dependability includes all students 
who, having made at least five withdrawals, had an average 
fine per book withdrawn which did not exceed $.010 and were 
fined less than 17 per cent of the time. (3) The undependable 
group includes students who had an average fine per book 
withdrawn of $.011 or greater, and were fined on 17 or more 
per cent of their withdrawals. This division placed 75 stu- 
dents (46 men, 29 women) in the very dependable group, 54 
students (30 men, 24 women) in the undependable group, and 
the remaining students (115) in the average group. 

More recently 110 of these students were given the Minne- 
sota Vocational Test for Clerical Workers and comparisons 
were made between their scores on dependability and clerical 
ability. Of these 110 students, 35 are from the dependable 
group, 53 from the group of average dependability and 22 
from the undependable group. 
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RESULTS 


Since the clerical test consists of two parts—number-check- 
ing and name-checking—the chi square formula was applied 
to each. The chi square between number-checking and de- 
pendability is 3.98 with a P-value of .42; between name-check- 
ing and dependability it is 9.10 with a P-value of .06. Obvi- 
ously the first of these P-values is not significant but the 
second is practically so. We may say, then, that although 
number-checking is not significantly associated with dependa- 
bility, name-checking is. 

Since we have a dependable group and an undependable 
group, we may further compare the median scores of the two 
groups on each part of the clerical test. First as to number- 
checking. The median’ obtained by the dependable group is 
22.7 and by the undependable group is only 12.5. When we 
compute the reliability of the difference between these two 
medians, we find that the critical ratio is 1.81 or 89 chances in 
100 that the difference is a true difference. Although this is 
not a significant difference between the two extreme dependa- 
bility groups, we may say that there are nearly nine chances 
in ten that undependable students are either slower or less 
accurate, or both, than dependable students in perceiving like- 
nesses and differences between pairs of numbers. <A second 
interesting fact is that the dependable students are somewhat 
more variable in their behavior on the number-checking test 
than the undependable ones. The Q’s for the two groups are 
18.28 and 15.38. 

The second part of the clerical test consists of checking pairs 
of names which may be identical or which may be similar but 
different in some detail. The median earned by the dependa- 
ble group on the name-checking test is 32.05, that earned by 
the undependable group is 21.0. Computing the reliability of 
the difference gives a critical ratio of 1.33 or 82 chances in 100 

1 The scores on each part of the clerical test are in terms of percentiles 


since the raw scores earned by women are usually somewhat higher than 
those earned by men. 
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that the obtained difference is a true difference. Thus we may 
say that there are about eight chances in ten that undepend- 
able students will receive lower scores than dependable stu- 
dents in name-checking. There is practically no difference 
between the two groups in variability as indicated by the Q’s. 

Another interesting comparison is between the middle 
group, or the one of average dependability, and the two ex- 
treme groups. The middle group obtained medians of 20.9 on 
number-checking and 32.7 on name-checking. These are prac- 
tically the same as those earned by the dependable group. 
When this group is compared with the undependable one the 
critical ratios are 1.74 for number-checking and 1.50 for name- 
checking, or 88 and 84 chances in 100 of a true difference. 

We may conclude, then, that there are between 82 and 89 
chances in 100 that undependable students will score lower on 
each part of the Minnesota Vocational Test for Clerical 
Workers than students of marked or average dependability. 

The reader may be of the opinion that there is a sex factor 
operating. Should one asswme that men are less dependable 
than women, coupled with the fact that women earn higher 
scores on the clerical test, one might believe that the obtained 
differences are due to sex. This is not the case. First, the 
sexes are approximately equally represented in the two de- 
pendability groups. There are 18 men and 17 women in the 
dependable group, and 13 men and 9 women in the unde- 
pendable group.? Second, the sex difference in speed of per- 
ception and response is eliminated by using the percentile 
tables for each sex provided by the authors of the test. 


CONCLUSIONS 


We have found: 1. That there is a practically significant 
association between clerical ability, as measured by name- 
checking, and dependability in returning borrowed books to 
the library. 


2 The ratio of men to women in the undependable group is about the 
same as that of the college population observed. 
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2. There are only about three chances in five that the associ- 
ation between number-checking and dependability is not due 
to chance. 

3. There are eight or nine chances in ten that undependable 
students will receive lower scores on both parts of the Minne- 
sota Vocational Test for Clerical Workers than either students 
of exceptional or average dependability. 

4. Scores earned on the Minnesota Vocational Test for 
Clerical Workers may throw some light on the testee’s depend- 
ability, in at least one type of field of conforming behavior. 
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THE INTELLIGENCE OF DIABETIC CHILDREN 
WITH SOME CASE REPORTS 


FLORENCE M. TEAGARDEN 
University of Pittsburgh 


HE writer has recently had occasion to examine six 
diabetic children. As she did so, reviewing the litera- 
ture meantime and talking with physicians and laymen, 

she was impressed with the prevalence of the idea that diabetic 
children are intellectually superior. One often hears physi- 
cians speak to this point. The parents of such children eagerly 
seize upon the idea perhaps thereby trying to compensate for 
what they must regard as a terrible misfortune inflicted upon 
their children. 

The writer has been able to discover only three references 
which report intelligence tests of diabetic children (16, 30, 29).* 
White, writing on diabetic children, in Joslin’s book (16) says, 
‘‘Though the child has knowledge and intelligence even more 
than average he lacks the wisdom to follow the discipline of 
treatment.’’ In another place White (30), still speaking of 
diabetic children says, ‘‘Precocity of mental development has 
persisted in spite of the delay in physical development.’’ Jos- 
lin (16) and White (30) discuss the same 169 diabetic children. 
They say that the Intelligence Quotient of these children was 
‘*higher by 10 on the average than a control series whose median 
age was the same.’’ No other facts regarding the control group 
are given, however, so the significance of the figures concerning 
the diabetics is somewhat doubtful. They further state that 
15 per cent of their children had IQ’s below 90; that 55 per 
cent had IQ’s of 90 to 110; and 30 per cent had IQ’s of 110 and 
over. (Presumably these were Binet IQ’s.) ‘‘Thus,’’ they 

1Since writing this statement, the author has seen a Master’s thesis 
prepared by G. D. Brown at the University of Minnesota (4a). Brown 


reports that his diabetic children showed no deviation from average popu- 
lations as regards intelligence. 
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conclude, ‘‘one-third of our diabetic children were of superior 
intelligence. ’’ 

Two questions immediately come to mind. First, if the 
diabetic children reported by Joslin and White were somewhat 
above average, does it follow that diabetic children as a group 
are likewise above average in intelligence? How highly 
selected from the standpoint of home background and intelli- 
gence of parents were these 169 diabetic children? What would 
we find to be the intelligence of the same number of diabetic 
children discovered by routine medical examinations in schools 
or in clinics for other diseases ? 

Our second question is : How intellectually superior is a group 
of children one-third of whom have IQ’s of 110 or over? It 
will be recalled that in Terman’s first standardization group 
(26) 20 per cent of the children had IQ’s of 90 or less; about 
60 per cent, 90 to 110; and about 20 per cent, 110 or over. In 
their more recent work, Terman and Merrill (27) found of their 
3000 standardization children 34 per cent had IQ’s less than 
95; 46 per cent, 95 to 115; and 20 per cent, 115 or over. To 
say then that out of the 169 children studied by Joslin and 
White one-third have IQ’s of 110 and over is not proving the 
group to be outstandingly superior. 

West, Richey, and Eyre (29) studied the intelligence of 76 
diabetic children. They report that their group shows a some- 
what higher average than usual. White (30) quotes West, 
Richey, and Eyre (citation not given) to the effect that 9.1 per 
cent of their cases had IQ’s below 90; 47.3 per cent between 90 
and 110; and 43.6 above 110. 

There follow here brief case reports on the intelligence tests 
and history of six diabetic children whom the writer has ex- 
amined recently. It need hardly be said that no generalizations 
are intended on the basis of six cases. 

Case A was a negro girl studied first when she was one year, 
seven months, and 20 days old, and again when she was one 
year, eight months, and 11 days old. She had been hospital- 
ized, in coma, at the age of nine and one-half months. Subse- 
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quently she had been hospitalized three times for diabetic treat- 
ment. At 17 months of age she had only four teeth and could 
not stand alone. Within recent months Case A sucked first her 
hands, then her feet, then her lower lip. It was difficult to 
keep her from putting fecal matter into her mouth. More 
recently she had begun to ruminate. There was almost no 
articulation. When Case A was chronologically one year, eight 
months, and 11 days old, she scored by the Linfert-Hierholzer 
Scale (19) a so-called mental age of 271 days, or about nine 
months. Her mental age was thus less than half her life age 
and her Development Quotient was 45. The rating of Case A 
on the same date by the California First Year Mental Scale (3) 
was a mental age of nine months. By the use of Gesell norms 
(11) (12) she had ratings in various functions ranging from 
about 8 to 12 months. In gross motor functions such as stand- 
ing and walking Case A most nearly approximated her own 
age level. In those functions partaking less of the motor and 
more of the intellectual she descended to the eight months level. 
Do we have here a decidedly feeble-minded child who also has 
diabetes or a child whose mental functions have been impaired, 
during the months in which the brain would normally be devel- 
oping at its most rapid rate, by disease so drastic as to have pro- 
duced coma again and again? Probably no one would hazard 
an answer to that question further than to say that probably the 
child was defective to begin with and that the possible 
deteriorating effects of coma await further investigation. 

Case B was a girl of 11 years, nine months. Her diabetes 
had been of about three years’ duration when the psychological 
test was administered. She had been in coma. She had also 
been diagnosed as a case of endocrine dysfunction. She was at 
the time about the height of the average nine year old girl and 
she had a very unusual hair distribution. Her mental age* was 
seven years and six months. Her Intelligence Quotient of 64 
indicates definite feeble-mindedness of high grade moron type. 


2In all of the following cases, mental ages were derived by the use of 
the new Revised Stanford-Binet Seale. (27) 
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An analysis of this child’s successes and failures on the test 
shows nothing that one does not find typically in the responses 
of feeble-minded children. If one had not known there was a 
history of disease one would have had no occasion to raise a 
question about possible deterioration. In all probability this 
case is a feeble-minded child, with endocrine dysfunction, plus 
diabetes. 

Case C illustra* 4 baffling series of end results the causes 
for which are d@. . to untangle. This somewhat macro- 
cephalic girl was 11 years, 10 months old. At seven months of 
age she had rickets. At two years she had poliomyelitis. She 
had also had measles, whooping cough, and chickenpox. She 
was still showing signs of chorea from which she had recently 
been suffering. Her right shoulder had been broken twice. 
Although adequate tests have not yet been administered to 
prove natural handedness (and sidedness) there is no obvious 
reason for supposing that she is left-handed by nature. Prob- 
ably because of damage produced either by the poliomyelitis or 
by the broken right shoulder the child uses her left hand pre- 
dominantly. If she is naturally right-handed and is being 
forced by the results of disease or accident to use her left hand 
there is a possibility, of course, of some confusion by change 
of dominance. Case C’s mental age was eight years and eight 
months, and her Intelligence Quotient, 73. Under normal con- 
ditions one would say that this IQ represents borderline intel- 
ligence, with a strong suspicion of slight feeble-mindedness. 
Probably no experienced psychologist, however, under the cir- 
cumstances, would be willing to interpret these figures with the 
predictive value that the Intelligence Quotient usually carries. 
All one can say is that at the time of the examination the child 
responded like a child with a mental age of less than 12 and an 
IQ of 73. How much higher, if any, this IQ may prove to be 
when the chorea has subsided, no one can say. How much 
higher it might have been had poliomyelitis and diabetes not 
intervened, no one can say. At the present time, however, she 
is functioning on a borderline level. 
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Case D was a boy 13 years and nine months of age. So far 
as is known, the diabetes was of only about two months’ dura- 
tion at the time his intelligence was studied. There is no report 
of coma. The boy secured a mental age of 11 years and six 
months. His Intelligence Quotient of 85 indicates dull normal 
intelligence. Since French is spoken almost exclusively in the 
boy’s home a test administered in English may not be an 
entirely fair measure for him. Language training might con- 
ceivably raise his IQ a few points. At any rate, perhaps we 
are quite safe in saying that at best he is of low average intel- 
ligence. 

Case E had had diabetes for about two years and had had 
coma. There was also a history of rheumatic heart disease. 
His chronological age was 11 years and seven months. His 
mental age was 11 years and no months. The consequent Intel- 
ligence Quotient of 95 indicates normal or average intelligence. 

Case F was the only one of the six cases who displayed supe- 
rior intelligence. This boy was 13 years, two months old His 
diabetes was of about four years’ duration. The child scored a 
mental age of 16 years and nine months. His Intelligence 
Quotient of 128 indicates very superior intelligence. When we 
find a child of this ability we are likely to raise no question of 
possible deterioration. Actually, as a matter of fact, do we 
know whether this child’s intelligence might have registered 
higher if he had had a perfect health record? An analysis of 
his distribution of successes and failures, however, does not 
indicate the gaps that we sometimes find in children who have 
been known to have suffered deterioration from some cause or 
other. 

In summary, then, these six cases of diabetic children ex- 
amined by standardized intelligence tests have Intelligence 
Quotients of 45 (with all of the reservations which must be 
made for quotients derived by the use of infant scales), 64, 73, 
85, 95, 128. Only two of these six children were, at the time 
they were examined, normal or above as regards amount of 
intelligence. The present findings, while exceedingly meagre, 
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do not, at least, point in the direction of the supposed superior- 
ity of diabetics. 

That there may be relationships between intelligence, changes 
in intelligence, and particular mental functions on the one hand 
and sugar balance, insulin shock, coma, and dietary restrictions 
on the other hand seems not impossible as one reviews the litera- 
ture. When, however, one becomes interested in the psychiatric 
and psychological phases of diabetes one finds himself in some- 
what of a maze as to cause and effect. Which of the mental 
and emotional characteristics of sufferers from diabetes are asso- 
ciated with the factors which produced the disease in the first 
place? Which come about as a result of the disease? Which 
emerge as concomitants of enforced diet, changed regimen, 
social deprivation, or medication ? 

Miles and Root (22) found that most of the inferiority of 
their diabetic subjects in the functions tested was in speed and 
span rather than in accuracy. They conciude that the retarda- 
tion was in neuromuscular processes. Fitz and Murphy (8) 
likewise say that the lessened strength of their diabetic cases was 
due to a retardation of neuromuscular processes. Dashiell (7) 
found rapid impairment and rapid improvement in the func- 
tions tested, under varying degrees of hyper- and hypo-gly- 
cemia. The impairment was not, Dashiell thought, muscular 
in nature but was brought about by ‘‘changes produced in the 
higher centers of the nervous system.”’ 

Neurological and histological involvements in diabetes have 
also been studied. Griggs and Olsen (14) report spinal cord 
changes in a case upon which autopsy was performed. Baker 
and Lufkin (2) discuss cerebral lesions in connection with 
hypoglycemis. Freed and Wofford (9) report a case of sub- 
arachnoid hemorrhage accompanying shock therapy (in schizo- 
phrenia). Weill, Liebert, and Heilbrun (28) present evidence 
for histopathologic brain changes in hyperinsulinism. So far 
as the writer has been able to discover, the effects upon intelli- 
gence of such neurological changes as the ones just mentioned 
have not been measured by the use of intelligence tests in 
diabetic cases. 
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The concomitance of diabetes and neuroses or psychoses has 
attracted several investigators among them the following: Dan- 
iels (6), Rynearson and Moersch (24), Moersch and Kernohan 
(23), Boudreau (4), Sevringhaus (25), and Kepler and 
Moersch (17). The correlation of diabetes and schizophrenia 
has been mentioned in the literature frequently. 

Psychological and sociological factors are reported to be of 
considerable importance as regards the etiology of diabetes. 
Strouse (1) stresses the higher incidence of diabetes among well- 
to-do adults. Crile (5) pays tribute to the diabetic etiological 
factor present in the nervous strain of modern civilization. 
Menninger (20) discusses the ‘‘Psychological Factors in. the 
Etiology of Diabetes,’’ and ‘‘The Interrelationships of Mental 
Disorders and Diabetes Mellitus’’ (21). 

As regards the psychological effects of restricted diet, which 
is an important aspect of diabetic therapy, there is a consider- 
able literature. To mention oniy a few of the articles in this 
field we might cite Fritz (10), Laird, Levitan, and Wilson (18), 
and Hoelzel (15). Opinion differs from one physician to 
another as to the relative roles that should be played by diet 
and by insulin treatment in the control of diabetes. Further 
studies on diabetic children by the use of intelligence tests may 
be of significance on this point. The literature on diabetes as 
shown in the Quarterly Cumulative Index Medicus indicates the 
involvements with other diseases in which diabetic children also 
become entangled. The effect of these complications as well 
as possible intellectual deterioration brought about by insulin 
shock and coma will require much study. 

In summary, then, it appears there is reason for thinking that 
neurological and psychological functions may be impaired by 
diabetes, and that insulin shock may produce neurological dam- 
age. Dietary restrictions may also have deleterious effects 
upon mental processes. Psychiatric accompaniments of dia- 
betes have been claimed by some. In the light of this evidence 
it would seem wnlikely that diabetic children as a group, when 
sufficient numbers have been studied, will be found to be supe- 
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rior intellectually. There would certainly seem to be no bio- 
logical reason why diabetes should more frequently develop in 
individuals of superior intelligence. On the other hand, there 
may be economic and sociological factors of differential selection 
of cases for study that explain the supposed intellectual superi- 
ority of diabetic children. We still need figures to show what, 
if any, changes in intellectual and social level will be repre- 
sented as more diabetic clinics are opened. Furthermore, the 
psychological factor of nervous tension in the etiology of dia- 
betes may selectively penalize superior intelligence. 

The distribution of intelligence among large numbers of 
diabetic children remains for the psychologist to explore. Like- 
wise the possible effects upon mental processes and upon intel- 
ligence of coma, insulin shock, restricted diet, and other factors 
operating in the life of the diabetic child must be investigated. 
It is hoped that further studies contemplated by the writer may 
help to answer some of these questions and that other psycholo- 
gists may interest themselves in the study of diabetic children. 
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INFLUENCES OF HEREDITY AND MUSICAL 
ENVIRONMENT ON THE SCORES OF KIN- 
DERGARTEN CHILDREN ON THE SEA- 
SHORE MEASURES OF MUSICAL 
ABILITY 


RUBY 8. FRIEND 
From the Institute of Child Welfare, University of Minnesota 


HAT at least three of the Seashore tests (15) for musical 
ability—those designed to measure pitch discrimination, 
sense of intensity, and sense of consonance—can be given 

satisfactorily to children of preschool age was determined by 
McGinnis (8), who devised various methods for adapting these 
tests to the understanding of the young child. She states that 
by shortening the amount of a phonograph record played and 
by increasing the intervals between judgments, the tests might 
become ‘‘valuable research instruments for use with young 
children.’’ 

The present investigator, using the technique devised by Mc- 
Ginnis, gave the-Seashore tests for pitch, intensity, and conso- 
nance to a group of kindergarten children and to their parents, 
in order to discover what relationship, if any, exists between 
the musical ability of five-year-old children and that of their 
parents, The children’s musical environment was also checked 
in order to test the statement of Seashore and Mount (16) that 
**the fundamental powers in musical talent such as pitch dis- 
crimination, intensity discrimination, sense of consonance, 
tonal memory and time are developed quite apart from associ- 
ation with music.”’ 

Subjects. Almost all previous experiments with the Sea- 
shore tests have been carried out with older children or adults 
as subjects, although in the study by McGinnis the subjects 
were from 41 to 59 months old. She found that the scores 
apparently did not depend on either C A or M A. Our sub- 
jects in the present experiment were 42 chiidren, 20 boys and 
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22 girls, attending kindergarten at the Institute of Child Wel- 
fare at the University of Minnesota; also 25 of their fathers 
and 35 of their mothers. Table 1 shows the chronological and 
mental age distribution of the 42 children. 












































TABLE 1 
Chronological and Mental Age Distribution of Child Subjects 
(in Months) 
Number of 
Number of children 
A in months children ha 
ges having specified C A eee ee 
87-89 1 
84-86 1 
81-83 5 
78-80 1 
75-77 5 
72-74 1 4 
69-71 2 7 
66-68 7 5 
63-65 14 5 
60-62 13 7 
57-59 3 0 
54-56 1 0 
51-53 1 1 





Mean chronological age = 63.1 months; S D = 3.8 
Mean mental age = 70.9 months; 8 D=8.0 





The distribution of the children according to IQ was as 
































follows: 
140 and over 1 110-14 3 
135-39 2 105-09 5 
130-34 2 100-4 6 
125-29 3 95-99 6 
120-24 4 90-94 0 
115-19 9 85-89 1 








Mean 1Q=113.8; SD =12.8 


When the group was classified according to occupation of 
father, using the Minnesota classification for urban occupa- 
tions, 17 children fell in Group I, 3 in Group II, 13 in Group 
III, 6 in Group IV, and 3 in Group V. 
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Selection and adaptation of tests. For our experiment we 
used the same tests that McGinnis used, those for pitch, inten- 
sity, and consonance. On the first measure, the subject judges 
whether the first or second of two tones in a series of pairs is 
higher ; on the second, whether the first or second in each pair 
in another series is stronger or weaker; on the third, whether 
the first or second of two combinations of two tones each is 
more harmonious. Although Seashore considers the tests of 
pitch, intensity, and time discrimination the most important, 
the test of consonance was substituted for that of time, partly 
because McGinnis had found that children of preschool age 
were less confused by two sounds or groups of sounds than by 
three or more, partly because the present author found on 
some preliminary tests that determining the longer of two time 
intervals marking off three clicks proved too difficult for chil- 
dren of kindergarten age. 

As in McGinnis’ experiment, the tests were called ‘‘ games ;’’ 
the descriptions ‘‘loud’’ and ‘‘soft’’ were substituted for 
**strong’’ and -‘‘weak,’’ ‘‘the baby and the daddy bear’s 
voice’’ for ‘‘high and low,’’ and ‘‘pretty and ugly”’ for ‘‘bet- 
ter and worse ;’’ and practice with a mouth organ and with the 
phonograph was given until the child seemed to understand 
what was meant. Explanations and demonstrations from Sea- 
shore’s Manual of Instructions (14) were also given. 

Method. The phonograph records containing the tests were 
played for the children individually in a room set aside for the 
experiment. Each child had two trials, in most instances 
within a week of each other. In almost every case both sides 
of each record were played at one sitting. On the pitch test 
the child made 50 responses, on each of the others 100—a total 
of 500 on the two trials. The child’s judgments and his re- 
sulting scores were recorded, together with information as to 
age, time at which each trial was begun and finished, and the 
degree of attention and of interest shown. The scoring of 
attention and interest was on a five-point scale representing a 
range from close attention and willing cooperation and inter- 
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est to marked hyperactivity and such strong objection that a 
record of performance could not be obtained. 

The children on the whole were interested and cooperative. 
Occasionally some of the younger children in the afternoon 
group seemed to be ti and their responses were not quick 
enough. The omissions caused in this way were more frequent 
on the test of pitch than on either of the other tests, and more 
frequent on the first than on the second trial. In only a few 
cases did the number of omissions warrant a repetition of the 
record. It was decided to handle the problem of omissions by 
scoring each test on the basis of the number of responses, 
rather than on an arbitrary basis of 100 or 50 as a perfect 
score. 


Reliability of tests. To establish the reliability of the three 
tests, the correlations between the first and second trials of 
each test were computed. These correlations are shown in 
Table 2 along with comparable figures from other investiga- 
tors, nearly all of which were obtained by testing older chil- 
dren or adults. The sum of the scores for our first trials of 
pitch, intensity, and consonance correlated .778 + .041 with 
the sum of the second trials, a relatively high coefficient for the 
Seashore tests. The most reliable test in this series was that 
of intensity. It will be observed that the correlations obtained 
by McGinnis, working with young preschool children, were 
lower than those obtained by most investigators using older 
subjects, and that her correlation for pitch was negative. Our 
own correlations are much closer to those for older children 
and adults. 

As a check upon Seashore’s statement (13) that musical 
talent is not a single ability but a hierarchy of abilities that are 
largely independent of each other, the children’s scores on the 
three tests were intercorrelated. The intercorrelations found 
(see Table 3) are higher than those found by Brown, Mc- 
Carthy, or McGinnis, with the exception of the correlation 
between pitch and consonance found by McGinnis. 
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TABLE 3 


Correlations between Measures of Pitch, Intensity, and Consonance as 
Found by Several Investigators 








Measures correlated Friend Brown McGinnis McCarthy 
Pitch and intensity 0... 63 25 492 09 
Pitch and consonance ................ 57 29 .623 21 
Intensity and consonance .... 54 04 - 26 10 





In order to find how closely the results of the Seashore tests 
correlated with subjective judgments of the children’s musical 
ability, we asked both parents and kindergarten teachers to 
rate the children on a five-point scale. One kindergarten 
teacher taught all day and two assisted in the morning and 
afternoon classes respectively. To obtain the reliability of 
their ratings the combined scores of the two assistant teachers 
were correlated with those of the teacher who taught full 
time. These ratings when correlated with the children’s scores 
on the three tests combined gave low positive correlations: 
.11 + .10 for the first trial; .15 + .10 for the second; and .15 
+ .10 for the two trials combined. These figures are fairly 
comparable with Brown’s correlations of .15 for pitch, .11 for 
intensity, and .17 for consonance. 

A somewhat higher correlation, .26 + .11, was found be- 
tween the sum of the parents’ ratings of the child and the 
child’s combined scores on both trials. The parents’ ratings 
of the children are slightly higher than those of the teachers. 
The ratings of the three teachers agree rather closely (r= .74) 
and teachers’ and parents’ ratings show a fair relationship: 
44+ .09 between teacher and mother, 45+ .11 between 
teacher and father, and .55 + .10 between the sum of the 
teachers’ ratings and the sum of the parents’ ratings. The 
correlation between the fathers’ and the mothers’ estimates 
of the children’s ability was .861. All these findings suggest 
that parents and teachers rating the musical ability of chil- 
dren are guided for the most part by factors other than those 
measured by the Seashore tests. 
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Musical ability and environment. To discover how much 
association with music our kindergarten subjects had had, we 
asked their parents to check on a list of instruments those 
which they had in their homes, to add any others not listed, 
and to indicate whether the instruments in’ the home were 
used daily, once or twice a week, once or twice a month, rarely, 
or never. They were also asked to note any musical training 
that the child received, and his apparent progress. The scores 
received on this report were correlated with the child’s record 
on each of the three tests and on the three tests combined. All 
these correlations were negative—a fact that is difficult to 
explain. That between environment and pitch discrimination 
was —.24+.11; between environment and intensity dis- 
crimination, —.32 + .10; between environment and conso- 
nance discrimination, —-.14 +11. For the three measures com- 
bined the relationship was —.24 + .11. The elimination of the 
radio from the list of instruments raised the correlation for 
the three tests combined to —.10 + .12; but the correlation 
remains negative. 

Of couse this result may be due to chance, since there were 
only 34 cases, or it may be traceable to some uncontrolled 
factors in the environment. It seems probable, however, that, 
as Seashore says, there is very little, if any, relationship be- 
tween musical environment and musical ability as measured 
by his tests. If this is true of much older children, it is likely 
to be true also of five-year-olds, whose musical awareness and 
opportunities for musical expression are usually limited. In 
the case of our group, this was certainly the case. With the 
exception of one child who had had a few lessons on the drum, 
none of the children had had any formal musical training. 
Their interest in the music in their homes, may, therefore, 
have been of a very passive kind. 

Parent-child resemblances. Since Galton’s day, many stud- 
ies have been made of the inheritance of mental, physical, and 
moral strains in family stocks. An excellent summary of the 
literature on the inheritance of mental characteristics is to be 
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found in Schwesinger’s Heredity and Environment (17). We 
may mention here particularly the investigations of Pintner 
(14), Willoughby (29), Jones (8), and Schuster and Elderton 
(16), all of whom found positive correlations between the abil- 
ities of parents and children and between those of siblings. 
In studies of parent-child resemblances, the mother-child cor- 
relations generally tend to be slightly higher than the father- 
cniid correlations. 

In testing the parents of our kindergarten subjects, Sea- 
shore’s instructions were followed carefully. In addition to 
the tests, which were given only once, one of each pair of par- 
ents was asked to fill out the questionnaire giving information 
as to musical environment. Each parent was asked also to give 
an independent estimate of his or her own musical ability, the - 
musical ability of the other parent, and that of the child. 

For our tests on pitch and intensity the correlations between 
parents and children were all positive, the highest, that be- 
tween the mean of the parents’ scores for intensity and the 
mean of the child’s two trials, being .456 + .109 (See Table 4). 
For the test on consonance, however, all the correlations were 
negative. This test was the most difficult of the three, since it 
involved aesthetic judgment rather than mere acuity of hear- 
ing. The writer in testing the children doubted seriously 
whether they were basing their answers on real aesthetic judg- 
ment; and probably even the adults had difficulty in discrim- 
inating between what was ‘‘good’’ and what was simply pleas- 
ing to them personally. 

On the tests on pitch and intensity the resemblance between 
mother and child was slightly but probably not significantly 
greater than that between father and child—a finding in agree- 
ment with what has generally been recorded in studies of par- 
ent-child resemblance in mental traits. 

Parents’ ratings of themselves and of each other. The cor- 
relations between the parents’ ratings of themselves and of 
each other on the one hand and their actual test scores on the 
other were as follows: 
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Mothers’ rating of selves and mothers’ test scores 
Fathers’ rating of selves and fathers’ test scores 
Wives’ rating of husbands and husbands’ test scores ...... .22 + .13 
Husbands rating of wives and wives’ test scores 


TABLE 4 


Correlations between Scores of Parents and of Children 
on the Three Tests 





Pitch Intensity Consonance 
Measures correlated 





r PE r PE r PE 





Mean of parents’ scores 


a I ROOIOR eel 14 14 46 11 -11 14 
Mother’s score and 

child’s mean on 

OD accccoeane . .09 11 28 .20 —.08 07 
Father’s score and 

child’s mean on 

os ee .02 14 16 07 —.04 14 





It will be seen that the fathers’ self-ratings correlate rather 
highly with their scores but that the mothers’ correlate nega- 
tively. The difference may reflect a socia! situation in which 
women, but not men, have been made to feel that musical 
ability is an asset, and consequently the less competent women 
have tended to over-estimate their talents. 


CONCLUSIONS 


1. The reliability coefficient between the two trials given the 
children, for all tests combined, was .778. The intercorrela- 
tions of the three tests were: .63 for pitch and intensity scores ; 
.57 for pitch and consonance scores; and .54 for intensity and 
consonance scores. 

2. The correlations between the parents’ ratings of the child 
and the child’s test scores was .264. That between the teach- 
ers’ ratings and the child’s test scores was .147. 

3. The muscial environment correlated negatively with the 
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child’s test scores for each of the three tests and for the three 
tests combined. 

4. The parent-child relationships were for the most part 
positive but not high. Correlations between the mean of the 
parents’ scores and the child’s scores were: on the test for 
pitch, .144; intensity, .456; consonance, —.111. There ap- 
peared to be a slightly closer relationship between mother and 
child than between father and child. 
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INTRODUCTION 


HE chief purpose of psychological tests of aptitude is to 

aid in predicting the potentialities of an applicant’s suc- 

cess in the occupation being considered. Bingham (1) 

defines aptitude as ‘‘a condition indicative of a person’s power 

to acquire specified behavioral patterns of interest, knowledge 

and skill. It is a present condition with a forward reference— 

a condition or set of characteristics regarded as symptomatic, 
indicative of potentialities.’’ 

The problem of ‘‘vocational selection’’ or the selection of 
suitable applicants for any job is a serious one to every em- 
ployer. The many and varied types of skilled occupations 
found in modern industry demand reliable and efficient 
methods in the hiring and allocation of workers. 

Today the New York Telephone Company, representative of 
the Telephone industry as a whole, has undoubtedly as many 
varied types of scheduled occupations as will be found in any 
modern industry. 

During the early days of its relatively short term of exis- 
tence and before the business became too complex, the qualifi- 
cations most sought in the applicant for employment were a 
sane mind and a sound body. Those qualifying in these re- 

1 The writers wish to express their indebtedness to Mr. J. W. Hum- 
phreys and Mr. E. A. Staples of the New York Telephone Co., Long 
Island Area, for cooperation and assistance in the preparation of the 


Test Battery and to Professor Clark L. Hull for reading and criticizing 
the manuscript. 
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spects were engaged with the knowledge that they could be 
used readily on some type of work for which they subsequently 
demonstrated fitness. However, as the business grew the need 
for employees with specific qualifications arose and in recog- 
nition of this need the Company focused greater attention on 
the selection of its personnel. Employment Bureaus were 
established and manned by interviewers fitted, by experience, 
to recognize in the applicant the qualifications peculiar to the 
vacancy to be filled. In this way, the growing complexity of 
the business was fairly well met at a cost consistent with the 
conditions of employment. On the introduction of Panel Type 
Dial Central Offices, with their complicated circuits and deli- 
cate equipment, it became apparent that the personal inter- 
view in itself was insufficient to determine the applicant’s fit- 
ness for a complicated job and a series of simple written and 
mechanical tests covering general aptitude were introduced to 
substantiate or supplement the judgment of the interviewer. 
While this innovation increased the percentage of satisfactory 
selections considerably in the case of potential Dial Central 
Office Maintenance men (Switchmen), it was apparent that a 
more accurate and reliable method would be desirable. 
Coupled with the growing need for the selection of the proper 
individual for specific jobs, this recognition may well be inter- 
preted as the purpose of this study. 

More specifically, the objective was to construct a reliable 
and valid test battery which would supplement the judgment 
of the interviewer in the selection of those to be trained for the 
job of Switchman. 


SUBJECTS 


The results of a total of 50 subjects were used in the con- 
struction of the test battery. These subjects represent an 
average cross-section of applicants who presented themselves 
at the employment office of the New York Telephone Company. 
They offer various degrees of training ranging from elemen- 
tary grade to college level of study. No attempt was made to 
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categorize the applicants on the basis of education or skilled 
training in other fields. 


PROCEDURE 


All subjects received the same preliminary training in the 
Panel Dial School before starting work as switchmen. This 
training was initiated by a review course in the Elementary 
Principles of Electricity. The course consisted of three parts; 
(1) ferromagnetism and electromagnetism, (2) alternating 
current circuits and (3) current calculations. Here it was 
observed that a knowledge of arithmetic was essential for 
success. Of assistance to the student taking the course was 
any general information concerning the subject matter that 
he may have acquired by academic training, home study or 
any other means. 

Following the course in principles of electricity the subjects 
were given a circuit study course which taught the application 
of the material learned in the first course to a collection of 
equipment symbolized as wiring diagrams and schematic 
drawings. The subjects learned to ‘“‘trace’’ the wiring sys- 
tems of the various pieces of equipment and to associate the 
diagramatic representations with the original apparatus 
located in the Panel Dial Central Office. The term ‘‘trace’’ 
embraces the process of following a line representing a wire 
until a unit of equipment or an open contact is reached. When 
an open contact is reached, the line must be retraced to the 
starting point and a new path chosen. This process readily 
simulates maze learning of a comparatively high order of com- 
plexity. Circuit description sheets are provided as an aid to 
‘*tracing’’ and learning circuit function. 

As a corollary to circuit study, short ‘‘Relay and Apparatus 
Courses’’ were given in which the subjects were taught to 
operate and adjust the various pieces of equipment using spe- 
cific adjustment instructions issued by Bell Telephone Labora- 
tories and called Bell System Practices. Upon completion of 
these courses, the subjects were sent as apprentices to a Cen- 
tral Office to work on the equipment studied. 
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The construction of the test battery was based on the prin- 
ciples formulated by Hull (2) in the introduction to his book 
on Aptitude Testing. ‘‘The most accurate method of deter- 
mining the aptitude of an individual for a vocation or other 
activity is the test of life itself. The ultimate test must always 
be the learning of the vocation in the ordinary manner, after 
which we may observe the degree of the individual’s profi- 
ciency when he has reached the limit of his training.’’ 

The training period in the Panel Dial School was a fertile 
source for test information. The review course in elementary 
principles of electricity, divided into (1) ferromagnetism and 
electromagnetism, (2) alternating current circuits and (3) 
current calculation, suggested components for the battery. To 
test this first stage of training a series of 40 items was com- 
piled. It included 31 multiple choice statements based on 
ferromagnetism, electromagnetism and alternating current 
circuits and 9 problems concerning current calculation. A\l- 
though given to the subjects as one test, the scoring was 
divided into three parts. Test 1 included units 1-14; Test 2, 
units 14-29; Test 3, units 29-40. Closely associated with these 
tests was the fourth—an arithmetic test containing 8 problems 
in addition, subtraction, multiplication and division of frac- 
tions and decimals. 

The courses in circuit study and relays and apparatus in- 
volving circuit tracing and use and adjustment of apparatus 
were a source of 5 additional tests. Tests 5 and 6 are alike in 
two ways: first, they are both tests of mechanical aptitude as 
judged by knowledge of apparatus function and adjustment 
and, secondly, they both concern the same piece of apparatus. 
Test 5 consists of a series of 10 questions based on a diagram 
of a 4-C Buzzer which has its various parts indicated by 
arrows and numbers. A complete understanding of mechani- 
eal function is necessary to answer the questions. Test 6 is 
made up of 2 types of questions—multiple choice and direct 
response, and has 12 parts. It is based on instructions for the 
adjustment of apparatus. The instructions have been pre- 
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pared by the Department of Development and Research of the 
Bell Telephone Laboratories, Inc., and are known as Bell Sys- 
tem Practices (Apparatus Requirements and Adjustment Pro- 
cedures). 

Circuit function and circuit ‘‘tracing’’ were more difficult 
to prepare in test form. General intelligence indicators 
seemed to offer little forecasting ability for this aptitude. Men 
with electrical engineering degrees were as apt to fail at this 
point as were more poorly equipped individuals. Tracing is 
fundamentally trial and error learning. Differing only in 
degree of complexity from the lower form studied in animals, 
it should be readily amenable to the same form of measure- 
ment used in animal studies—the maze. To test this aptitude, 
therefore, three mazes were incorporated in the battery as 
Tests 7,8 and 9. The subjects were required to trace through 
from start to finish of the 3 mazes which are successively more 
complicated and difficult. | 

The battery of tests was given to students upon their 
entrance into the Panel Dial School. The testing was con- 
ducted under rigidly controlled conditions as regards to 
instructions, time limit and scoring. 


CRITERION SCORE 


Since the occupational activity of the training period does 
not lend itself readily to accurate measurement as behavior or 
product of behavior, the criterion score to be used as a basis of 
comparison for the aptitude testing was determined by the 
ranking method described by Hull (2, 374) and classified by 
him as ‘‘subjective impression criterion.’’ The ranking was 
done by the instructors in the Dial School. Since each instruc- 
tor was in personal contact with each student at least once 
each day, circuit instruction being done on an individual basis, 
their intimate observations were valuable in the determination 
of the criterion. Weekly reviews of the standing of the stu- 
dents were held under the direction of the Supervisor of Train- 
ing. Inasmuch as this was essentially the method of deter- 
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mining whether or not the student was successful in the school 
before the test battery was considered, the criterion score thus 
secured was deemed reliable. This procedure was continued 
until 50 men who had had no previous Panel Dial training had 
completed their schooling and had been ranked by the Faculty 
and Supervisor. Where any doubt existed as to the student’s 
place in the ranked series, as might occur when two or more 
instructors disagreed as to a student’s ability, the student was 
dropped from the study group. The ranked series of 50 stu- 
dents was then transmuted into units of amount on an ordi- 
nary 10 point scale and served as the criterion score for this 
study. 
RESULTS 


The computation of correlation coefficients conveniently 
arranged in Table 1 showed that all tests have a high correla- 


TABLE 1 


Showing the correlation coefficients of the test battery, arranged for 
convenient examination 











Variable Variable numbers 
te “ST Wap Yoh: Va a” aa eae eee 
S aakes 420 
Fk. 456 639 
“aay 462 622 655 
occ 518 479 .592 472 
“SE. 347 518 453 «454431 
ern 486 582 546 534 564.609 
ee: -422 -289 -.093 -260 -.393 -.437 -.449 
‘os -232 -.251 -111 -246 -.411 -429 -.480 .757 
“SR -257 ~.279 -226 -292 -.345 .034 -.309 .115 .280 





tion with the criterion score and a rather high intercorrela- 
tion. The latter might have been expectei in view of the 
broad aptitudes for which each test was designed. Since the 
tests can be given at a low cost, are easily scored and there- 
fore, contribute enough to the prognostic efficiency of the 
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battery as a whole to repay costs incidental to their use, they 
were all retained. Further evidence in favor of retaining 
these tests, in spite of high intercorrelation, lies in the fact 
that these tests which have the highest intercorrelation coeffi- 
cients (Tests 1, 2, 3, 4, 5 and 6) show stronger correlation with 
the criterion score than do the three (Tests 7, 8 and 9) with 
lower intercorrelation. 

As the next step, the tests were combined in such a way as 
to yield the best possible aptitude prediction. This step was 
effected by statistical weighting of each test by use of the 
multiple regression equation as devised mainly by Tolley and 
Ezekiel (3) and described by Hull (2). The equation devised 
was 


X,=-21 X,+.66 X,+.61 X,+1.88 X,+.007 
X,+1.09 X,-.83 X,+.53 X,-.08 X, +23.70 


The correlation between the test battery thus weighted and the 
eriterion was derived by the following procedure: 


_ [Po, W,+Po, W,+Po, W,+Po, W, ete 
@) B= 4 Poo 
(2) R= .6778 
The multiple correlation coefficient is .6778 with a coefficient 


TABLE 2 


Showing score range and per cent of failures of 222 men tested by 
the new test battery 














Score Per cent 

range Total Failure failure 

0-9 3 3 100.0 
10-19 6 5 83.0 
20-29 10 7 70.0 
30 - 39 25 9 36.0 
40-49 52 10 19.0 
50 - 59 76 8 10.5 
60 — 69 42 1 2.4 
70-79 8 0 00.0 
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of reliability of .853. This corresponds to a forecasting effi- 
ciency of 27.2 per cent. 

The test battery has been given to 222 men who have been 
trained and released to Central Offices as switchmen or have 
been transferred because of failure to other departments. 
Table 2 shows the percentage of failures according to score 
range of these 222 men. 

The Panel Dial Aptitude Test Battery has proven to be 
exceptionally effective in predicting success in the Dial School. 
Inasmuch as new employees hired for this craft would have to 
pass successfully through the Dial School, this test battery 
can be used as an effective employment test. From a study of 
the 222 men who have taken the test, we can predict that, with 
a critical score set at 50, we would have 14 failures out of 
every 135 men employed. If the critical score were set at 60, 
there would be employed only 1 failure out of every 50 men 
selected. It must be remembered that to secure 109 men with 
a score of 50 or better, or 42 men with a score of 60 or better, 
would require testing 222 men, if the applicants at the em- 
ployment bureau were of the caliber of the students sent to the 
Dial School during the period in which the tests were 
employed. 


SUMMARY AND CONCLUSIONS 


1. A battery of 9 tests was prepared and given to 50 sub- 
jects. 

2. Statistical analysis showed a high correlation between the 
individual tests as well as high correlation of the tests with the 
criterion score. 

3. The tests were weighted and computation revealed a 
multiple correlation coefficient of .6778 with the criterion 
which corresponds to a forecasting efficiency of 27.2 per cent, 
a value much higher than yielded by most aptitude tests. 

4. Although the tests of the battery are by design broad in 
function and tend to measure overlapping function, they were 
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not radically changed because of the simplicity of administra- 
tion and scoring and the relatively low cost involved. 
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INTRODUCTION 


ECAUSE of its simplicity of administration and ease of 
scoring, the Otis Self-Administering Test has been ex- 
tensively employed for the testing of adults both in voca- 

tional and educational guidance and in selection procedures for 
industrial and business employment. Pintner (12) has de- 
scribed the test as ‘‘particularly good for the testing of edu- 
eated adults’’ (p. 203). But despite the widespread use and 
strong endorsement of the test, in our opinion its adequacy 
for adult groups has never been established. 

The standardization of the test for adults has been severely 
criticized by Boynton (1), although his criticisms are largely 
on @ priori grounds. Extensive use of the test by the writers 
during the past six years has disclosed a number of technical 
criticisms meriting further analysis. Preliminary study of the 
data obtained in this testing program suggested that the 
examination is too easy for adults and that the items are very 
inadequately arranged in order of difficulty. The present 
study investigated these points in detail upon all four forms 
of the higher examination. The subjects were obtained from 
a variety of adult industrial and educational populations and 
represented a wide range of ability. The study was especially 
concerned with the following aspects of the tests: (1) the per- 

1 The writers wish to thank Drs. 8. N. Stevens, L. N. Vernon and A. 


W. Brown for their co-operation in the administration of the tests, and 
Dr. A. 8. Otis for reading of the manuscript. 
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centage of subjects completing the examination; (2) the diffi- 
culty of the test items; and (3) the arrangement.of the items 
in the proper order of difficulty. The adequacy of the reli- 
ability, validity and norms of the test was not investigated. 
Data on these problems have been reported by Broom (2), 
Chapanis (3), Guiler (5), Otis (11), Traxler (15) and others. 

It should be emphasized at the outset that while the Higher 
Examination is widely used for testing adult groups, it was 
not originally intended solely for adults, and that any criticism 
of its adequacy for such individuals does not necessarily 
reflect on its suitability for younger populations. 


TEST GROUPS 


The four forms (A, B, C and D) of the Otis Self-Adminis- - 
tering Test of Mental Ability (Higher Form) were adminis- 
tered to 8,867 adults ranging in age from 14 to 50 years. The 
majority of the subjects were in the age range from 21 to 30 
years. The administration of all of the tests was personally 
supervised and conducted according to the procedure outlined 
in the Manual of Directions and Key published by Otis. The 
test was sometimes given individually, sometimes to groups up 
to 150 in number.? To permit a complete analysis, the tests 
were administered with both 20 and 30 minute time limits. 
Six different populations were employed : 

Group I. Applicants for positions involving public contact 
work at Household Finance Corporation, a personal loan or- 
ganization. Age range 19-28. 

Group II. Freshmen, Sophomore, Junior and Senior uni- 
versity students at Northwestern University. Age range 17- 
25. 

Group III. Junior and Senior students at Evanston Town- 
ship High School. Age range 14-19. 

Group IV. Applicants for position of guide at Chicago 
World’s Fair, 1933. Age range 18-30. 


2 Whether these tests are given individually or in groups has been 
shown by Krueger (9) and others to have little effect upon the scores. 
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Group V. Applicants for educational and recreational jobs 
supervised by Federal Works Progress Administration. Age 
range 19-50. 

Group VI. Young adults in federal Civilian Conservation 
Corps camps. Age range 18-25. 

The populations of the various groups taking the tests in 
the four forms are shown in the table below: 



































Form 
Group 
A B Cc D 
30 minute time limit 
I 2057 415 
II 217 125 251 215 
III 285 192 
IV 159 
Vv. 989 981 
VI 200 
20 minute time limit 
II 125 193 136 184 
IV 855 322 308 148 
Vv 279 231 








RESULTS AND DISCUSSION 


Number of Questions Attempted 


In a time-limited test, such as the Otis, it is essential that a 
relatively small number of individuals finish. Thus: ‘‘The 
rule upon which the time limits of alpha were originally based 
was that no more than 5 per cent of an unselected group 
should complete all items on any test’’ (17, p. 419). In 
analyzing the adequacy of the present test for adults in this 
respect, the percentages of subjects finishing varying numbers 
of questions were first computed. The number of questions 
attempted by each subject was measured by the last ques- 
tion to which an answer was given. This assumes that the 
subjects followed directions, trying to answer the questions in 
the order in which they are presented in the examination. 











370 CARL IVER HOVLAND AND E. F. WONDERLIC 


TABLE 1 


Percentages of College Students (Group II) of Equal Ability Attempting 
Various Numbers of Questions on Four Forms in 30 and 20 
Minute Time Limits 










































































Percentage of sub 
No. quests. Form ots 
attempted 
A B Cc D 
30 minute time limit 
40-42 1.5 eee 5 
43-45 1.5 os 3.0 5 
46-48 1.5 me. > Saas 1.8 
ai, REE SE 8 3.8 18 
52-54 4.6 3.2 4.5 2.8 
55-57 7.4 5.6 6.0 4.2 
58-60 4.6 2.4 6.0 3.3 
61-63 10.8 11.2 6.7 12.4 
64-66 14.2 9.6 9.8 8.0 
67-69 3.1 4.8 5.3 11.2 
70-72 7.7 12.8 6.8 14.9 
fo an eR ‘ 43.1 46.4 48.1 38.6 
Number of subjects .. 217 125 251 215 
20 minute time limit 
25-27 sa a eat 
28-30 sas 1.0 cles ‘iin 
31-33 ite 2.1 6 in 
IIIA iiteseencsttiiteiatcen . oe 1.0 12 ioe 
37-39 1.2 5.4 3.4 2.2 
40-42 7.3 75 11.0 4.8 
43-45 9.8 5.4 74 5.4 
46-48 7.3 10.8 74 9.6 
49-51 12.2 4.3 11.0 4.8 
ei Rint AEA = 7.3 6.5 10.3 9.6 
DE : ccrbiiccbitscninih : 11.0 18.3 15.4 9.0 
TMD Winenepitlininatiniies - 8.5 3.2 7.4 8.5 
RSS ee 4.9 8.6 5.9 13.0 
RENE Ti ESN Eee 8.5 8.6 4.4 4.8 
i, mei ie 11.0 6.5 44 7.6 
70-72 6.1 5.4 2.2 7.6 
73-75 4.9 5.4 7.4 13.1 
Number of subjects .. 125 193 136 184 
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Observation by the examiners has verified this assumption. 
The percentages of subjects attempting various numbers of 
questions are given for the four forms with the customary 30 
minute and also with a 20 minute time limit in Table 1. Col- 
lege students (Group II) of equated ability were the subjects 
tested. The data of Table 1 are presented graphically in 
Figure 1. It will be observed from the figure that the dis- 
tribution of number of questions attempted is markedly 
skewed when the 30 minute time limit is employed. An ex- 
tremely large percentage of the subjects complete 73, 74 or 75 
questions in the half-hour period. The percentages of sub- 
jects completing the entire examination in 30 and in 20 
minutes are given in Table 2. The fluctuation from form to 


TABLE 2 


Percentages of Subjects Completing Examination on Four Forms in 80 
and 20 Minute Time Limits. Subjects Identical with 
those of Table 1 (Group I7) 





Form 
A B Cc D 
30 minute time limit 
Percentage of subjects 
completing test ............ 39 22 38 32 








20 minute time limit 
Percentage of subjects 
completing test. ............ 2 3 4 10 





form is to be attributed to the variation in the difficulty of the 
last item in the four forms. (See data concerning item analy- 
sis of difficulty below.) The data of Tables 1 and 2 and 
Figure 1 with respect to the 20 minute time limit show that a 
much smaller proportion of the subjects complete the exami- 
nation in this time and hence less undermeasurement is pres- 
ent.* The thirty minute time limit is, however, much more 
extensively employed, even with adults. 

8A 15 minute time limit on the Otis test has been successfully used by 


Miles (10). Presumably, a very small percentage of the subjects com- 
plete the test in this period. 
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There is a marked difference in the per cent of subjects 
within the various groups who complete the examination. The 
proportion of subjects who complete the test and are conse- 
quently inadequately tested increases with the ability level of 
the group. This is to be seen from Table 3 where the mean 


TABLE 3 


Average Scores of Various Groups and Corresponding Percentages of 
Subjects in each Group Completing Test in 80 Minute 








Time Interval 
Per cent of 
M be . ; 
Group N partes ny J subjects ae 
Form A 
PE abe ae ee 217 55.26 39.12 
BI eich aissiriciitectoninies 285 52,13 37.09 
PRR to aires 2057 51.75 22.23 
TP ictikadd tS seas inate 989 49.05 22.80 
Form C 
ESRI ne 251 53.61 38.15 
at 192 50.97 36.24 
ge ETE ate 415 49.41 22.93 
a AM seas Sees 981 45.78 23.65 





scores and the corresponding percentage of subjects com- 
pleting the examination are given for typical groups of vary- 
ing ability taking forms A and C with the 30 minute time 
limit. The correlations between the score on the test and the 
number of items attempted will be presented below. 

The mean scores of Group III are very close to those origi- 
nally reported by Otis (11) for college students. But even 
for this group 36 or 37 per cent of the subjects complete the 
examination. It is apparent that when this large a percentage 
of subjects complete the test in 30 minutes the examination in 
its present form is highly unsatisfactory for the groups tested, 
since individual differences are obscured as a result of con- 
siderable undertesting of the ability of some individuals. A 
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TABLE 4 


Relationships between Number of Questions Attempted, Number Cor- 
rect and Number Missed on Four Forms of Test with 30 
and 20 Minute Time Limits 








Variables correlated r PE. N Group 
30 minute time limit 
Form A 
No. quest. correct and no. attempted ...... +.666 .008 2274 I 
sad se ** no. missed .............. —.515 .010 -and 
me Cee Lem Ce ae II 
Form B 
-_~ quest. correct and no. attempted ........ . +651 034 125 
- ‘¢  #* no. missed .............. — .551 .042 II 
oS Gee ele — *+.211 .058 
Form C 
= quest. correct and no. attempted ........ . +.714 013 666 I 
ay FO ED tints woe = 568 .017 and 
“6 | Ie ee? oF ee OARS AS II 
Form D 
oa quest. correct and no. attempted ...... +.608 .029 215 
" 66 FE MO, MESO een — 514 034 II 
os See Tt oF eed ome $232 .043 
20 minute time limit 
Form A 
_ quest. correct and no. attempted ...... +.723  .010 980 II 
wie «© no. missed ............. —.224 .020 and 
$6, FOr OE SF RE Ee cijjmtinien a» +461 .017 IV 
Form B 
me quest. correct and no. attempted .... +.617 .018 515 II 
" ss #6 no, missed ............ —.551 .021 and 
‘( (WG « * Coo. OS IV 
Form C 
No. quest. correct and no. attempted ...... +.684 .017 444 II 
ae <c.6h OU”) om Eee .... — 8 A and 
1. ee eet am +366 .028 IV 
Form D 
No. quest. correct and no, attempted ...... +.630 .022 332 II 
eB mS ee -.517 .027 and 
gee (SF. -cine Oe eee IV 
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closely related point regarding the average difficulty of the 
test items will be discussed below. 


Relationships between Number of Questions Attempted, 
Number Correct and Number Missed 


It was of interest, in connection with the analysis of the test, 
to compute the intercorrelations between these various func- 
tions. In studies, such as that of the writers (7) on the effect 
of old age upon intelligence test scores, these correlations are 
essential in partialling out of the loss in score with age, the 
amount due to decreased speed and the amount due to de- 
creased power. Correlations between these measures are pre- 
sented in Table 4 for both 20 and 30 minute time limits. The 
average correlation between the number correct and the num- 
ber attempted is about +.65 for both the 30 and 20 minute 
time intervals. This supports the positive relationship ob- 
served in Table 3. An inverse correlation exists between the 
number correct and the number missed of slightly more than 
~.50. Between the number of questions attempted and the 
number missed there is a positive correlation of about + .20 
with the 30 minute form but over +.30 with the 20 minute 
limit. Presumably the lower correlation with the longer time 
limit is the result of the reduced range in the distribution of 
items attempted. The eta’s throughout the series tend to 
differ unreliably from the r’s. 

As a check on these correlations the partial correlations were 
computed between the number of questions correct and the 
number missed with the number attempted held constant. 
One would expect to obtain a perfect inverse correlation under 
these circumstances. The actual partial correlations are all 
above —.90. Part of the discrepancy between these values and 
the predicted perfect negative correlations is caused by slight 
departures from linearity in the regressions. Perfect positive 
correlations are to be expected between the number of items 
correct and the number attempted with number missed held 
constant and between number attempted and number missed 
with number correct held constant. 
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Order of Difficulty of Test Items 


Otis states that he followed the customary procedure of 
arranging the items in the test in order of difficulty: ‘‘The 
items in each form of each examination have been arranged 
in the order of difficulty, according to the number of passes of 
each item by the students taking the preliminary editions.’’ 
(11, p. 3.) 

To check the adequacy of this standardization for adult sub- 
jects the percentages of subjects passing each item were first 
computed. The ratio (in per cent) of the number of subjects 
giving correct answers to the number of subjects attempting 
each question was used as the measure of difficulty of the indi- 
‘ vidual test items. As before, it was assumed that the subjects 
followed the printed instructions of the examination and at- 
tempted the questions in the order presented so that the last 
question marked or answered was the last question attempted 
by the subject. Data on the percentage of subjects in Group 
V passing each item are given in Table 5 for forms A and C 
with the 30 and 20 minute time limits. Table 6 gives the cor- 
responding data for Group II on all four forms for the two 
time limits. From these data the rank in difficulty of each 
question was determined, which could then be compared with 
the order assigned to that item in the printed test. Thus, if 
question number 1 in Form A was the 26th in actual difficulty, 
the discrepancy in rank would be 25. Rank order correlations 
between the order of the item in the test and its difficulty are 
given in Table 7 for various groups taking the four forms of 
test with the two times of administration. The correlations 
(po) range from + .42 to +.75. Thus the tendency for the ques- 
tions to be arranged in order of difficulty with the first ques- 
tion easiest and the last most difficult is only from 9 to 34 per 
cent above chance (Index of forecasting efficiency (Hull, 8)). 
Whether this disparity between the order of difficulty obtained 
and the order in which the items were placed by Otis is only 
true in the case of adults is not indicated by our data. The 
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TABLE 5 
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Percentages of Subjects Passing each Individual Item on Forms A and C 
with 30 and 20 Minute Time Limits (Group V, N’s = 989-Form 
A 30 min.; 279-Form A 20 min.; 981-Form C 30 min.; 


231-Form C 20 min.) 











Form A Form C 
3 

oe 
1 95 95 39 79 76 8 39 «685—es 856 
2 80 ‘72 40 80 73 = 3838 81 40 74 65 
3 84 84 41 74 67 3 92 90 41 7 62 
4 98 98 42 69 61 4 90 87 42 70 61 
5 89 87 438 62 49 ,: Te Fé 43 59 49 
6 83 82 44 27 21 6 85 86 44 30 25 
. 2 45 58 45 , te % 45 59 42 
8 81 80 46 85 83 8 81 80 46 89 88 
a SS a 7 9 62 63 47 78 % 
10 96 91 48 73 75 10 85 86 48 94 94 
11 90 87 49 58 61 [ =. 49 76 79 
12 91 88 50 57 56 18 88 79 50 55 57 
18 98 9290 51 84 47 mw w Ti 51 46 36 
14 83 82 52 84 86 14 63 59 S82 7 78 
15. 88 75 53 92 89 15 78 65 53 86 83 
16 91 90 54 67 64 16 83 80 54 66 59 
17 90 84 55 66 70 17 2 98 55 62 65 
18 87 88 56 66 60 18 68 73 56 42 40 
19 9 93 57 68 76 19 59 60 57 56 56 
20 84 84 58 39 39 20 92 93 58 39 23 
310 6 G8. Sl 59 30 40 21 45 35 59 74 70 
22 83 79 60 54 55 23 7 75 60 69 60 
23 86 = 88 61 5&5 57 28 85 85 61 S38 4 
ae ee | 62 64 653 oS: 7% 62 61 60 
25 69 #464 63 80 81 25 67 72 63 90 90 
26 92 90 64 61 57 26 84 86 64 45 37 
27 97 96 65 83 76 27 95 96 65 82 81 
28 91 88 66 55 49 28 94 97 66 46 46 
29 93 91 67 35 33 29 51 #50 67 381 25 
30 83 82 68 49 43 30 79 80 68 33 46 
| De 69 62 81 31 69 67 69 52 47 
32 96 92 70 35 37 32 67 «74 70 40 22 
33 80 77 71 85 95 33 «658—COS8 ae ae 
34 97 96 72 46 565 34 78 79 72 6&2 42 
35 86 80 73 S51 55 35 88 75 7338 98 84 
36 89 91 74 656 65 36 98 98 eS: WF 
37 58 54 75 38 47 37 SS 47 75 36 36 





72 


63 


70 





59 
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TABLE 6 
Percentages of Subjects Passing each Individual Item on Forms A, B, 
C, D with 30 and 20 Minute Time Limits 
(Group II, N’s=217-Form A 30 min.; 125-Form B 30 min.; 251-Form C 
30 min.; 215-Form D 30 min.; 125-Form A 20 min.; 193-Form B 
20 min.; 136-Form C 20 min.; 184-Form D 20 min.) 














Form Form 
oS 6° Sb ee be 
30 minute time limit 

1 9 999 «93 84 39 «89 «686 —C(9téD*D 
2 88 86 90 88 40 84 96 76 ~&# 89 
S oe oe 41 80 76 98 ~ 87 
4 99 86 96 82 42 73 80 84 100 
S 6 ae ee 48 #79 #352 +72 #8 59 
6 9 8 87 99 44 34 «690«89t—«*70” 
7 96 96 «89 94 45 59 39 OF 15 
s “=. * @ 46 92 48 92 85 
9 97 88 68 88 47 86 «84 «= 88t—é«é‘CTL 
10 100 98 90 78 48 72 56 997 62 
11 96 90 £9 «97 i er 
12 9 94 «90 90 50 60 86 66 66 
13 9 7 #%7 «67 51 54 64 «55 98 
14 90 98 68 95 52 900 «85—ial—(ité«éCG 
15 92 8&8 8 2 53 89) «58 i86—~«SB 
16 99 97 98 91 54 8ls8AhCiCi7GC—“‘«i‘«‘ RD 
17 96 86 96 84 55 OA$Ci‘i SG CiCSC“‘i‘CA 
18 9 73 #72 «78 56 O70)0C6Si‘iNC“‘<‘éSTO 
19 95 91 468° 82 57 71 60 64 68 
2 #7 88 6 79 2 ea Ss. @ 
21 66 97 «50 94 a a en, ee 
22 «96)~«=C89té«iT_StiéCSB 60 62 91 78 # 7 
23 4994 82 98 81 61 63 95 68 99 
2 8©=6 881i 6t—“i«éiH~CtéKsdCS 62 65 71 67 ~~ 239 
2 7 9 # «17 76 68 90 58 95 # 78 
2 «#495 «92 @=©6tsé8 64 #59 «#357 «2650 
27 «99 76—‘z TtSté«C'C 65 78 61 90 54 
28 4«693—C=«iTtié«iat« 66 4638:—«COC(‘<ié‘ik OCS 
22 «#499 «78 ~=—61l_StéiTT 67 2 73 386 78 
30 91 63 90 87 68 44 27 +44 ~ 27 
31 92 82 #71 78 69 63 #73 466 ~~ 32 
32 100 +72 #479 = 98 70 42 #78 44 7% 
330 8hiCiktiaTD]ts«é 71 95 45 86° 68 
To oe a 722 50 89 «659 ~—tOB 
35 94 67 «+91 84 733 89 «#466 «294 ~~] 
36 «2 96—i79)—s«i00—St—«é#B83B 7% 68 6 82 ~ 61 
37 4 66—i86s—“«é«iTs~Ci‘ié‘«é 7 8 #47 «46. (20 
38 480 56, 87 86 
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TABLE 6 (Continued) 
¥Yorm Form 
eT eS a ae ee D 
20 minute time limit 
1 ‘98 100 97 92 39 88 83 81 83 
2 80 83 91 86 40 76 93 82 88 
3 91 92 94 93 41 79 80 83 82 
4 88 75 97 82 42 68 86 70 99 
5 89 81 74 86 43 69 56 77 60 
6 90 75 88 99 44 26 63 42 55 
7 87 96 83 99 45 53 42 60 10 
8 88 77 89 92 46 82 44 92 86 
9 91 87 71 84 47 83 76 81 69 
10 96 98 87 81 48 78 47 96 64 
11 96 91 90 93 49 72 62 92 44 
12 93 90 88 92 50 61 82 58 56 
13 97 69 73 67 51 47 60 48 92 
14 93 95 65 94 52 96 90 77 73 
15 82 77 82 31 53 86 55 94 59 
16 98 92 92 92 54 74 75 69 25 
17 97 82 98 89 55 69 83 70 86 
18 94 78 77 83 56 64 63 51 74 
19 93 83 63 87 57 75 56 59 67 
20 80 78 94 73 58 50 43 29 69 
21 62 92 42 94 59 42 56 73 33 
22 96 86 84 65 60 59 91 68 76 
23 93 82 88 89 61 52 94 54 99 
24 79 92 83 95 62 64 70 55 29 
25 66 90 65 82 63 89 58 93 77 
26 91 85 94 87 64 56 58 48 19 
27 99 73 100 56 65 65 52 83 54 
28 96 96 99 93 66 32 45 32 87 
29 98 76 56 98 67 22 81 22 79 
30 91 62 85 60 68 43 30 32 28 
31 88 79 69 59 69 67 70 47 38 
32 ©6100 68 71 98 70 47 70 28 76 
33 83 89 62 95 71 86 34 77 70 
34 99 87 85 95 72 50 84 43 100 
35 90 58 88 78 73 50 60 90 75 
36 97 81 100 87 74 54 53 67 45 
37 60 88 61 94 75 50 30 53 21 
38 77 61 79 86 
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TABLE 7-A 


Correlations (Rho) between the Assigned Ranks of Questions and the 
Obtained Order of Difficulty for Various Groups on the Four 
Forms with 30 and 20 Minute Time Limits 








Time limit Inferred 
Form (in min.) N Rho P.E. . 
Combinations involving Groups I, II, or IV 
A 30 2433 75 .03 -76 
20 980 -70 04 72 
Comb. 3413 .74 04 -76 
B 30 125 59 05 61 
20 515 58 05 .60 
Comb. 630 60 05 62 
Cc 30 666 49 .06 51 
20 444 51 .06 53 
Comb. 1110 48 06 50 
D 30 215 46 .06 48 
20 332 45 .06 AT 
Comb. 547 44 .06 46 
Group III 
A 30 285 73 .04 75 
Cc 30 192 56 05 58 
Group V 
A 30 989 -70 04 72 
20 279 59 05 61 
Cc 30 981 A7 .06 49 
20 231 42 06 44 
Group VI 
Cc 30 200 51 .06 53 





order does, however, stay relatively constant for the various 
groups which were tested (see below). 

An interesting point to be observed is that the correlations 
are highest on Form A and drop off for the other forms. It 
appears that the original test construction was most careful 
on the Form A and that the other forms were constructed by 
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TABLE 7-B 


Intercorrelations between Order of Difficulty Obtained with Different 
Groups on Various Forms with 30 and 20 Minute Time Limits 








Groups Inferred 

Form compared Rho P.E. . 
30 minute time limit 

A II-V .96 01 96 

C II-V 95 01 95 
20 minute time limit 

A II-IV 97 01 97 

B II-IV 95 01 95 

Cc II-IV 97 01 97 

D II-IV 94 01 95 





analogy. This is apparently true at least in the case of Forms 
A and C where identical types of questions occupy the same 
positions in the series. The order in which the questions in 
Form A are assigned correlates +.75 with the order of the 
items in difficulty ; the corresponding correlation for Form C 
is only + 49. B and D are also apparently constructed to be 
analogically parallel, and some difference in the magnitude of 
the correlation is likewise obtained. 

Very frequently questions are greatly increased or decreased 
in difficulty by slight changes in their form. An example of 
a question which Otis selected for Form C as being identical 
in type and supposedly also in difficulty to a corresponding 


TABLE 7-C 


Intercorrelations between Order of Difficulty Obtained with 30 Minute 
Time Limit and with 20 Minute Time Limit for Various 
Forms on Equated Groups of Subjects 








Form Group Rho P.E. a2~mes 
A II and V comb. 97 01 97 
B If 93 01 94 
Cc II and V comb. 95 01 95 
D II 94 01 95 
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one on Form A, but which became much more difficult in the 
altered form is the following: 


Form A 
Question 19. A clock is related to time as a thermometer is to (?) 
1 a watch, 2 warm, 3 a bulb, 4 mercury, 5 temperature ( ) 
% of subjects passing = 92 
(Group V, 30 min.) 


Form C 
Question 19. A thermometer is related to temperature as a speedometer 
is to (?) 
1 fast, 2 automobile, 3 velocity, 4 time, 5 heat............(_) 


% of subjects passing = 59 
(Group V, 30 min.) 


An example where the analogical question on Form C was 
much easier than the comparable item on Form A may also be 
given: 


Form A 
Question 48. The opposite of treacherous is (1) 
1 friendly, 2 brave, 3 wise, 4 cowardly, 5 loyal................. () 
% of subjects passing = 73 
(Group V, 30 min.) 
Form C 


Question 48. The opposite of cowardly is (?) 
1 brave, 2 strong, 3 treacherous, 4 loyal, 5 friendly......(_ ) 


% of subjects passing = 94 
(Group V, 30 min.) 


Because of the number of these fluctuations in difficulty from 
form to form it appears that adequate standardization contra- 
indicates the use of questions assumed to be similar without 
empirical test of their difficulty.‘ 

It might be thought that some of this displacement of the 
questions in order of difficulty is due to the selective factors 
operative in the latter sections of the test. If the last ques- 
tions were answered by a very select group, the percentage of 

4It is interesting to note also that when an identical question is used 


in two forms of the test, a closely similar percentage of subjects pass the 
item (cf. question 18, forms B & D). 
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subjects passing would, of course, be increased and the ques- 
tion would be rated as easier than would otherwise be the case. 
As a means of determining whether this was responsible for 
the low correlation, rank order correlations were obtained on 
the first forty questions only, where 95 per cent of the subjects 
attempt the items. The correlations were as follows: 








TABLE 8 
Form Time N Rho P.E. Impliedr 
A 30 min. 989 + .27 07 .28 
A 20 min. 279 + .23 07 24 
Cc 30 min. 981 + .22 .08 23 
Cc 20 min. 231 + .27 07 .28 





It is apparent from these results that even for the group of 
questions which all of the subjects attempted, the questions 
are not arranged in the proper order of difficulty. The reduc- 
tion in the size of the correlation for the first 40 items from 
the value for the full 75 questions is presumably to be ex- 
plained principally on the basis of the reduced range of diffi- 
culty involved. When the correlation for the 75 items is pre- 
dicted (Guilford, 6, p. 416) from the values; for the first 40 
items given in Table 8, an average discrepancy of about + .11 
from the values given in Table 7 is obtained. This difference 
is not statistically reliable but the predicted correlations are 
uniformly higher than the ones actually obtained. This indi- 
cates that some selective factors are at work, but that even with 
maximum allowance for these influences, the questions are 
markedly out of order of difficulty. 

The results were next analyzed to determine whether or not 
there are fluctuations in the order of difficulty with different 
groups of subjects and with different time limits. Extremely 
high correlations were obtained between the orders of the items 
obtained with different groups (Table 7-B). The constancy 
of the orders for different groups is further shown by the fact 
that the correlation between the difficulties of the questions 
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obtained by us for Form A and those calculated from the data 
of Chapanis (3) for the same form was +.96. Widely differ- 
ent adult groups then appear to give a relatively constant 
order of difficulty, but one which agrees only slightly with the 
one used by Otis. That the discrepancy is not due to sex dif- 
ference is indicated by Chapanis’ data. 

The correlations between the order of difficulty obtained 
with the 30 and 20 minute time limits are presented in Table 
7-C. Here a remarkable stability in the difficulty of questions 
even with different times of administration is indicated, a fact 
which can also be seen by inspection of Tables 5 and 6. 


Degree of Difficulty of Items 


From the theoretical analysis of Symonds (13) and the ex- 
perimental studies of Cleeton (4) and Thurstone (14), it is 
apparent that the optimal difficulty of a question is attained 
when about 50 per cent of the subjects pass the item (Guil- 
ford, 6). From the results of Tables 5 and 6 it will be ob- 
served that the present Otis test is entirely too easy for adult 
subjects. Even though the groups analyzed by us had mean 
scores very similar to those originally reported in Otis’ norms, 
between 55 and 60 per cent of the questions were answered by 
75 or more per cent of the subjects. For Chapanis’ subjects 
41 out of 75 questions were passed by 75 or more per cent. In 
the case of the highly superior adult groups there are as many 
as 80 per cent of the questions which are passed by 75 or more 
per cent of the subjects, and 40 per cent which are passed by 
90 or more per cent. These results indicate that the present 
test is inadequate in its discriminative power because of the 
excess of easy questions. 





To overcome the chief limitations of the test brought out by 
the preceding analysis and at the same time to retain many 
of its desirable features the authors have developed an 
abridged Otis test. The new test has been completely restand- 
ardized on over five thousand subjects. Items were selected 
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from the original tests which had satisfactory validity and 
were of constant rank in difficulty for a variety of different 
groups of subjects. The test, when administered with a 12 
minute time limit, has a reliability as high as that of the origi- 
nal Otis test, is finished by less than two per cent of the sub- 
jects, and has an order of difficulty which correlates with the 
order obtained on various new groups to an extent of over 
+.92. This abridgement will be described in detail shortly 
(16). 
SUMMARY AND CONCLUSIONS 


1. The Otis Self-Administering Test of Mental Ability 
(Higher Form) has had an extremely wide-spread use in the 
testing of adults despite the fact that its suitability for mea- 
suring such individuals has never been determined. The 
present investigation was designed to study the adequacy of 
this test for typical adult populations. Over 8,800 subjects 
chosen from business, industrial and educational groups were 
examined. 

2. The present results indicate a fundamentai inadequacy 
of the test for adult groups because of the fact that an unduly 
large number of subjects (almost 40 per cent) finished the 
examination within the allotted time of 30 minutes. The per- 
centage of subjects completing the test is greater the more 
intelligent the group tested. The administration of the test 
with a 20 minute time limit reduces the number finishing the 
test to a more suitable proportion (less than 10 per cent). 

3. Correlations are given between the number of items at- 
tempted, the number correct, and the number missed with the 
20 and 30 minute time limits. The correlations between the 
number of questions attempted and the number correct is 
about +.65. Between the number of items attempted and the 
number missed the correlations are about +.25. An inverse 
correlation of about —.50 is obtained between the number cor- 
rect and the number missed. A check on these results is 
afforded by the fact that the partial correlations between the 
number of questions correct and the number missed with the 
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number attempted held constant are all above — .90, since theo- 
retically this correlation should be a perfect negative one. 

4. The items in the test are not properly arranged in order 
of difficulty for adult subjects. The correlation between the 
order of the items in the test and the ranks of the questions in 
terms of the percentage of subjects answering these items 
correctly is from +.45 (Form D) to +.75 (Form A). The 
difficulty of a given item tends to remain constant even with 
different times of administration and for different groups of 
subjects. It appears that Form A was the most carefully 
standardized and that the other forms were constructed by 
using items analogically similar. Examples of wide discrepan- 
cies in the difficulty of items chosen as being similar in type 
and difficulty on the different forms are given. 

5. For adult populations an excessive number of the ques- 
tions are inadequate in their discriminative power. Even with 
groups whose mean score on the test was exactly at the mean 
of Otis’ norms for college students, nearly 60 per cent of the 
questions were passed by over 75 per cent of the subjects. 
Almost a quarter of the questions were so easy that they could 
be passed by 90 or more per cent of the subjects. 

6. The items in the Otis test have been entirely restandard- 
ized by the authors. Abridged forms of the test have been 
prepared which can be administered in 12 minutes. The test 
has a reliability as high as the original Otis test despite the 
shorter time limit, and overcomes a number of the difficulties 
outlined above. Further description of these tests will appear 
in a subsequent publication. 
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FACTORS INFLUENCING QUESTIONNAIRE 
RETURNS FROM FORMER UNIVER- 
SITY STUDENTS* 


C. ROBERT PACE 
University of Minnesota 


HE increase in follow-up studies of the economic and 
professional success of college students has raised again 
the problem of the representative nature of question- 

naire returns. Largest of these recent studies has been the 
one coordinated by the U. S. Office of Education, which sur- 
veyed nearly 50,000 graduates in 31 institutions throughou‘ 
the country. Extensive studies have also been made by Pur- 
due? and Minnesota.* The Purdue questionnaire was mailed 
to a cross section of alumni and yielded 85 per cent returns. 
Other questionnaires, however, have commonly been mailed to 
all alumni of recent years and have usually yielded about 50 
per cent returns. Even though the resulting picture of the 
economic status of the average college graduate has not been 
a particularly prosperous one, critics have still pointed out 
that only the relatively successful alumni may have answered 
the questionnaires, that those whose subsequent experiences 
have been characterized by failure and unpleasantness are not 
so likely to respond. Results obtained independently by these 

* Assistance in the preparation of these materials was furnished by 
the personnel of the Works Progress Administration official Project 
Number 665-71-3-69. 

1W. J. Greenleaf, Economic Status of College Alumni. Bulletin 1937, 
No. 10. U. 8. Office of Education. Washington, D. C.: Government 
Printing Office, 1939. 207 pp. 

2 E. C. Elliott, F. C. Hockema, and J. E. Walters, Occupational Oppor- 
tunities and the Economic Status of Recent Graduates of Purdue Univer- 
sity. Lafayette, Indiana: Purdue University, 1936. 24 pp. 

3A. C. Eurich, and C. R. Pace, A Follow-Up Study of Minnesota 
Graduates from 1928 to 1936. Minneapolis, Minnesota: University of 
Minnesota Printing Department, 1938. 41 pp. 
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surveys have been strikingly similar; but the criticism of 
biased sampling is equally persistent. Since generalizations 
from these surveys are so dependent upon sampling factors, 
it is important to determine the differences between those who 
answer questionnaires and those who fail to answer. 

In 1936 the General College of the University of Minnesota 
projected a study of former Minnesota students.‘ The pur- 
pose of this study was to discover in as comprehensive and 
detailed a way as possible the activities, problems, and atti- 
tudes of a representative group of young adults who had been 
out of school from about 5 to 13 years. Questions were pre- 
pared to cover four significant areas of living—vocational 
adjustments, home and family relations, socio-civic participa- 
tion, and personal life interests. A 52-page questionnaire, 
attractively printed and illustrated, was prepared. The sam- 
ple selected included 1600 cases—800 men and 800 women, 200 
each from the entering classes of 1924, 1925, 1928, and 1929. 
Each group of 200 was drawn also from entrants to the Col- 
lege of Science, Literature, and the Arts; the College of Agri- 
culture, Forestry, and Home Economies; the College of Edu- 
cation; and the College of Engineering and Architecture. 
These four colleges absorb the major share of entering fresh- 
men at the University. The number chosen from each college 
was proportional to the total entering enrollment in that col- 
lege in each of those years. Within the college, the sample was 
a random selection from alph»betical lists. 

On December 2, 1937, the 52-page questionnaire was sent to 
1507 of these 1600 former students. This represents the num- 
ber for whom addresses were known. A return envelope with 
postage was also included. During the next three months five 
follow-up notices were sent to those who failed to return the 
questionnaire. In all, 951 usable replies were received—59 
per cent of the original 1600; 126 questionnaires were re- 


4A fuller description of the study is contained in The General College 
Personnel Research Studies, University of Minnesota Mimeograph De- 
partment, May, 1938. See section on ‘‘ Procedures and Progress of the 
General College Adult Study,’’ by C. R. Pace. 
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turned unanswered because of wrong addresses. Therefore, 
1381 questionnaires had been delivered. With this as a base 
figure, replies were received from 69 per cent of those who got 
the questionnaire. 

For each of the 1600 cases, date of birth, the total number of 
quarters in school, and graduation or non-graduation were de- 
termined from University records. Length of residence and 
graduate or non-graduate status provided an indirect measure 
of academic success. It was possible first to compare those 
who returned the questionnaire with those who did not return 
it on the foregoing items. 

From the questionnaire itself the following items were 
selected as important factors influencing returns: occupational 
classification ; relation of present job to field of specialization 
at the University; income; job satisfaction score ;*> economic 
and cultural status scores;* general adjustment and morale 
scores.” These measures are indices of economic, professional, 
and personal success. While it was not possible to compare 
directly returns versus non-returns on these items, a method 
of analysis has been edopted which may furnish a reasonable 
approximation to such comparison. This is the method of 
early versus late returns. 

Early and late returns were qualified in one respect. On 
the first page of the questionnaire, space was provided for the 
subjects to sign their names if they would be willing to cooper- 
ate further by having some representative of the University 
come to interview them. These signatures indicated a more 
than casual interest in the project and provided another clue 


5 Adapted from ‘‘Job Satisfaction Inquiry’’ by Robert Hoppock. 
Published by the Psych. Corp., 1934. See Job Satisfaction, Harper and 
Brothers. 1935. 303 pp. 

6 Adapted from ‘‘Seales to Measure Urban Home Environment’’ by 
Alice Leahy Shea. See The Measurement of Urban Home Environment. 
University of Minnesota Press. 1936. 70 pp. 

7 Minnesota Survey of Opinions (short form)’’ by E. A. Rundquist, 
and R. F. Sletto. See Personality in the Depression, University of 
Minnesota Press. 1936. 398 pp. 
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in studying early and late returns. Accordingly the first 100 
signed questionnaires have been compared with the last 100 
unsigned questionnaires. An analysis of the dates on which 
questionnaire replies were received indicates that all of the 
first 100 signed questionnaires, with 5 exceptions, were re- 
ceived before the first follow-up notice could have taken effect. 
All of the last 100 unsigned questionnaires were from persons 
who had received at least 3 follow-up notices, from a majority 
who had received 4 follow-up notices, and from many who had 
received all 5 follow-up notices, before they finally returned 
the questionnaire. The early returns are those sent in without 
any prodding or urging and by persons who indicated a will- 
ingness to give more of their time to this University project. 
The late returns are those sent in only after considerable urg- 
ing and prodding and by persons who did not indicate any 
willingness to contribute more time to the project. 

A comparison of these early and late returns assumes that 
the late returns are more nearly like the non-returns than are 
the early returns. The logic underlying this assumption may 
be briefly described. Whether or not a person will return the 
questionnaire and when he will return it depend on a favor- 
able combination of all the factors which inflnence question- 
naire returns: interest; conscientiousness; habits of prompt- 
ness; time available; pleasurable associations with the source 
of the questionnaire (the University) ; sufficient lack of em- 
barrassment with one’s present status to be willing to report 
that status; and many other factors. Those who return the 
questionnaire can be said to possess more, perhaps both in a 
quantitative and qualitative sense, of the characteristics which 
lead to the act of returning a questionnaire than those who do 
not return it. Further, those who return the questionnaire 
promptly possess more of these favorable characteristics than 
do those who delay several weeks before returning it. It seems 
reasonable, therefore, to assume that if any combination of 
factors is influencing questionnaire returns, the early returns 
should be most heavily loaded with that combination, and the 
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late returns should be more like the non-returns than the early 
returns are. At least, a comparison of early and late returns 
should reveal differences in the same direction as would a com- 
parison of returns and non-returns. Certain factors operating 
to give a non-representative sampling in a questionnaire study 
may then be analyzed to see how they affect early and late 
returns as segments of the total range from early returns 
through delayed returns to absolute non-returns, or refusals 
to cooperate. 

In the present study, this logic can be tested first with refer- 
ence to differences in the item of academic success. Students 
who graduated from the University obviously had more ‘‘aca- 
demic success’’ than those who did not graduate. In the origi- 
nal sample of 1600, 38 per cent of the men graduated from the 
University. In the total returns 51 per cent of the men were 
graduates and 20 per cent of those who did not return the 
questionnaire were graduates. In both the early and the late 
returns there were 47 per cent men graduates. Evidently the 
method of early versus late returns does not reveal the same 
trend as does the direct comparison of returns versus non- 
returns. For women, however, the method proves more valid. 
The original sample included 43 per cent of graduates. The 
total returns show 51 per cent who were graduates, while only 
30 per cent of the non-returns were graduates. This same 
difference is reflected in the early and late returns. The most 
successful students returned the questionnaire early since 60 
per cent of the early returns were graduates. Only 44 per 
cent of the late returns were graduates. The late returns are 
more nearly like the non-returns than are the early returns. 
When the data were further analyzed for length of residence 
in the University, the same general findings resulted. The 
non-returns for both sexes were heavily weighted with stu- 
dents leaving at any time prior to the completion of nine 
quarters of work. The late returns for men were not signifi- 
cantly different from the early returns in this respect ; whereas 
the late returns for women were cverloaded with students 
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leaving prior to the completion of nine quarters of work, and 
the early returns showed a much smaller percentage of such 
eases. The factors of sex, age, and year of entrance to the 
University may also be considered briefly. In general, men 
returned the questionnaire more promptly than women, but 
there was no difference between the total number of men and 
women who eventually replied. There was practically no dif- 
ference in age between the various samples for either sex. 
There was a slight tendency to get a larger proportion of total 
returns from students who have attended the University more 
recently (entered in 1928-29). But among the early returns 
for women, the older entrants (1924-25) tended to reply 
somewhat more quickly. 

Actual differences between returns and non-returns in the 
important factors of economic and professional success are in- 
determinate. But comparisons can be made between early and 
late returns. These comparisons will reveal differences which 
probably are underestimates of the actual differences between 
returns and non-returns, but which will reveal the direction of 
the true difference. 

Table 1 shows the occupational distributions found in the 
total returns and the early and late returns.; For both men 
and women the early returns contain a greater proportion of 
people in the professional occupations than do the late returns. 
There are practically no differences between the percentages 
of early and late returns in other occupational groups for men. 
For women there seem to be differences between early and late 
returns in all the occupational groups, but since there is no 
consistent pattern in these differences, it is not possible to 
interpret the material clearly. 

In the relationship between students’ present jobs and their 
field of specialization at the University, the differences be- 
tween early and late returns are small and not statistically 
significant. There is, however, a slight tendency for the early 
returns to contain a larger proportion than the late returns of 
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subjects whose jobs are in the same field as their University 
specialization. 

A comparison of median incomes indicates that there is no 
consistent or significant tendency for the more prosperous men 
and women to return their questionnaires early. 

Table 2 groups together the remaining comparisons between 
early and late returns. A glance at the columns of critical 
ratios will reveal that there are no significant differences be- 
tween early and late returns. On General Adjustment and 
Morale negative ratios have been reversed, since a lower score 
means a more favorable adjustment and higher morale. Ex- 
cept in the General Adjustment scores for women the early 
returns have better scores than the late returns. The fact that 
all of the differences between early and late returns for men 
and all except one for women are in the same direction is indi- 
cative of a trend in the direction of the original hypothesis. 

In conclusion, the following factors appear to operate to pro- 
duce a higher selection among questionnaire respondents than 
was true among the original sample selected for study: (These 
factors are ones on which the early returns showed the highest 
selection, the late returns showed the lowest selection, and the 
total group of returns was in the middle position. ) 

1. Employment at the professional levels—both men and 

women. 

2. Jobs in the same field as University specialization—both 

men and women. 

3. Economic status—men. 

4. Cultural status—women. 

5. Job satisfaction—men and women. 

6. Morale—men and women. 

In this study, factors of income and general adjustment 
showed neither significant differences nor a consistent direc- 
tion of differences between early and late returns. Apparently 
these factors were not so important in influencing question- 
naire returns. 

Actual comparison between total returns and non-returns 
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showed that graduation from the University and number of 
quarters of University work completed were both important 
factors influencing returns; but factors of sex, age, and year 
of entrance to the University were relatively unimportant. 

Obviously, the method of comparing early and late returns 
is not a sufficiently sensitive test to indicate the extent of bias; 
but it does provide a simple and valuable tool for determining 
the probable direction of bias; and as such it might be used in 
judging the representativeness of returns in other question- 
naire suveys. Insofar as the present sample of former Univer- 
sity students is similar to other samples on which follow-up 
surveys have been made, it is possible that the results of such 
surveys are similarly biased. 





SUBNORMAL GIRLS WITH DISCREPANT 
TEST PATTERNS 


THEODORA M. ABEL 


Manhattan High School for Women’s Garment Trades, 
New York City 


N a series of studies carried on at Letchworth Village, a 
New York State institution for mental defectives, test pat- 
terns were obtained which differentiated between subjects 

who had succeeded and others who had failed in the skills of 
pillow lace-making and weaving (2, 4). By test pattern is 
meant a relationship between performance of a subject on two 
or more psychometric tests. This relationship score (i.e., ratio 
or difference between one test score and another) was found 
to indicate differences in skill where individual scores did not, 
or at least did not so effectively do so (5). 

Pattern scores revealed also a definite relationship to a more 
general mode of adjustment than that of acquiring a specific 
skill: among one hundred high grade moron girls, pattern 
scores differentiated the successes and failures on parole and 
those who were never recommended for parole (3). 

In the present investigation, we proposed to study the prob- 
lem in reverse order, that is, to select two groups distinguished 
on the basis of test pattern, in this case the difference between 
a language and a non-language psychometric test, and find out 
what other differences in performance existed between the two 
groups in order better to understand their possible range of 
, abilities and disabilities. 

Among 400 subnormal girls with CA 15 and 16, all of whom 
were receiving the same training in the simpler processes of 
dressmaking and garment machine operating, there were two 
small groups with quite disparate patterns, i.¢., ratios between 
scores on a language test (Otis Intermediate S-A. Exam.) and 
a non-language test (Pintner). Both groups had the same 
Otis IQ scores, ranging from 60-80. One group of 30 girls, 
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DISCREPANT TEST PATTERNS 399 


however, had Pintner IQs between 100 and 109, giving them 
a pattern score or average ratio of 1.50 (Pintner IQ divided 
by Otis IQ). The A.D. was .12 and the range 1.3-2.0. The 
other group, consisting of 26 girls, had Pintner IQs between 
60 and 80 and not more than three points higher than the Otis 
score in any case. The pattern score average for this group 
was .99 (A.D. .04, range .81-1.05). 

In educational and cultural backgrounds the two groups 
were fairly equivalent. Both came from ungraded and ad- 
justment classes in the public schools. In the group with the 
high pattern scores (over 1.3) 43 per cent completed 6B grade, 
while in the group with low pattern scores (under 1.04) only 
27 per cent did so. But this difference is not statistically reli- 
able. Using the X? method, the association of pattern score 
with grade completed in school was found to be: X?=1.39 
30 > P > .20. 

In the group with high pattern scores (HPS) and the one 
with low pattern scores ‘LPS) there was a predominance of 
girls of Italian ancestry. In the former, 70 per cent had 
Italian parents; in the latter only 54 per cent did so. This 
difference, however, is not reliable statistically as can be seen 
by the X? method where in the association of pattern score 
and Italian nationality, X?=1.55 .30>P>.20. The other 
girls in both groups came from several nationalities, Spanish, 
Jewish, Greek, Slavic. No negroes were included. In both 
groups, the fathers’ occupations could be classified in the 
lowest three classes (IV, V, VI) of the Minnesota Intelligence 
Occupational Scale. 

Our further comparisons fell into two categories: psycho- 
metric tests other than the Otis and Pintner, and success in 
trade training. The following psychometric tests were 
selected : 


1 One case in this group had an Otis IQ below 60—that is, 51. 

2 Two or three cases who had high Pintner scores and low Otis scores as 
a result of being foreign born and not having learned English adequately 
were excluded from this study. 
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I Language tests: Monroe reading and the vocabulary of 
the 1937 Binet. 

II Non-language tests: Goodenough drawing, Knox cubes, 
Designs from the Army Performance Scale, Diagonal 
Formboard. 

The first group of tests was selected in order to see to what 
extent the girls with high pattern scores showed merely read- 
ing disability ; the second group was selected in order to see 
to what extent the girls with the same high pattern scores 
showed facility on different types of paper and pencil and 
manipulative tasks other than the Pintner. The tests were 
administered individually with the exception of the Monroe 
and the Goodenough drawing and were presented in the same 
order for each subject.* Success in trade training was esti- 
mated on the basis of the ratings of two teachers who judged 
the girls’ trade work as poor, fair, fair plus or good. Girls 
with records of fair plus and good were considered by the 
teachers as capable of being trained in the garment trades, 
while those with poor or fair records were not deemed capable 
of profiting from this trade training. 


RESULTS 


Table 1 shows the performance on standardized test of the 
groups with high and low pattern scores. 

As can be seen, on the Monroe reading test there was no 

marked difference between the two groups. The low group did 
somewhat better than did the high group on reading rate. 
But this difference is not statistically reliable. Using the X? 
method no significant association was found between reading 
, rate and pattern score (X?=1.551 .30 > P > .20). 
_ On the vocabulary of the 1937 Binet there was also no dif- 
ference in performance between the two groups. As the vo- 
eabulary test is given orally, it is apparent that the language 
limitations of the high group dre not only due to reading dis- 
ability. 

8 Thanks are due Miss Jane Sill for administering the tests. The work 


was carried on during the winter of 1938 in the Adjustment Classes of the 
Manhattan High School for Women’s Garment Trades. 
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TABLE 1 


Showing Comparative Performance of Groups with High and Low 
Pattern Scores on Standardized Tests 


Monroe Reading #1 Vocabulary 1937 Binet 
Per cent attaining Grade IV Per cent attaining score 13 
Group Rate Comprehension 
HPS 83 69 83 
LPS 67 61 .23 
Goodenough Drawing Knox Cubes 
Per cent attaining score 11 Per cent attaining score 9 
Group 
HPS 83 50 
LPS .23 0 
Army Designs Diagonal Formboard 
Per cent attaining score 12 Errors Time 
Group 30 or less 60” or less 
HPS 67 86 66 
LPS 19 80 44 


Among the four tests with no language requirement, the 
high pattern group did markedly better than the low pattern 
group on three tests (Table 1). On the Goodenough drawing 
and Army designs, scores of 11 and 12 respectively were 
attained 3.5 times more frequently by the HPS than by the 
LPS group. On the Knox cubes, one half of the HPS group 
made a score of 9 or more, while none of the LPS group did 
so. These differences are all statistically reliable by the X? 
method. 

The discrepancy of performance between the two groups on 
the Diagonal Formboard was not so great. The difference in 
number of errors between the two groups was insignificant. 
The HPS group completed the task somewhat faster than did 
the LPS group, but this difference in time between the two 
groups was not significant statistically (X*=2.51 10 >P> 
.05). Apparently, in a manipulative task like the formboard, 
the girls with high pattern scores could not go faster than 
those with low pattern scores, nor did they find the task any 
easier. 

According to the teachers’ estimates, all but one of the 30 
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girls in the high group were considered capable of trade train- 
ing. The one failure was interested in becoming a musician 
and never made any effort to cooperate. Four others were 
behavior problems, frequent truants and uneven in their work, 
but when they did get down to business, their work was rated 
as Fair Plus or Good. These four are included among the 
possible good workers. 

In the group of 26 with low pattern scores, one left school 
before an accurate rating was obtained. Of the other 25, 20 
or 80 per cent were considered unable to profit by the training 
sufficiently to do work in this field. Their ratings in speed 
and quality of work were not higher than Fair. Only 5, or 
20 per cent were considered capable of trade training and es 
good workers as those in the high group. 

As a further check, a short sewing test was given individu- 
ally to the girls in each group. They were required to sew a 
hem as fast and as well as they could on the side of an 18” 
piece of unbleached muslin. The hem was basted and the 
needle threaded in advance. Each girl was allowed five min- 
utes for the work.‘ The score was the number of stitches made 
in that time. When this test was given, some of the girls had 
already left school so that only 22 in the HPS group and 15 
in the LPS group did the hemming. In the high group, 17 or 
77 per cent made 45 stitches or over, while in the low group 
only 6 or 40 per cent did so. This difference is statistically 
reliable: in the association between pattern score and the num- 
ber of stitches, X?=5.26 .05 > P > .02. 


DISCUSSION OF RESULTS 


We have seen that the high pattern group exceis the low 
pattern group not only in the Pintner Non-language test, but 
in the cube imitation which resembles the imitation task in the 
Pintner, in copying designs from memory, and in drawing a 
man. It would seem, therefore, that the former had a special 


4 This test was carried out by the psychologist so that the procedure was 
approximately uniform for each subject. 
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ability in the manipulation of the spatial field under visual 
control, an ability that was not dependent necessarily upon 
language ability. 

In an earlier study carried on by the author on cutaneous 
localization, we found evidence for the fact that superiority in 
a visual-spatial task, the Goodenough drawing, was related to 
superiority in the visual control in skin orientations (1). In 
other words, we found evidence for a possible superior ability 
in the handling of visual spatial relationships over and above 
language ability or even that of a general level of intelligence 
as measured by the Stanford-Binet. A small group of subnor- 
mals (11 in number) with MAs 11 or more on the Goodenough 
drawing but with MAs three years or more lower on the Stan- 
ford-Binet, made cutaneous localizations by a direct visual 
control method a good deal more accurately than did a group 
of normal unselected individuals. On the other hand, an un- 
selected group of subjects at the moron level of general intelli- 
gence, with IQs on the Otis S-A. Examination 60-80 (with 
one exception), made a poor showing in skin localizations by 
a visual control method, doing much worse than the normal 
group. This unselected group of subnormals did better by a 
tactual-kinesthetic method of localization, which was not the 
ease with the specially selected group with high scores on the 
Goodenough. 

We have seen, also, that in trade training in hand-sewing 
and in garment operating as well as in a simple sewing test, 
the majority of girls with high pattern scores (Pintner IQ 
over 100 and Otis IQ under 80) were superior to those with 
low pattern scores (Pintner and Otis IQ 60-80). The girls 
with the high scores were succeeding in their trade training, 
provided, of course, that they were interested in their work. 
On the other hand, the girls with the low scores, with a few 
exceptions, failed to learn to do handsewing or to run a power 
machine sufficiently well to be recommended as capable of 
working in this field industrially. 

In the present investigation, we have not considered the 
large majority of girls receiving the same trade training as 
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those in our selected groups. This majority has Pintner IQ 

75-85 and Otis IQ 60-70. Among these there are many suc- 

cesses and many partial failures in trade training. For them 

it would not be as possible to predict success and failure on 
the basis of Pintner and Otis scores alone, as it would be with 
individuals with more highly differentiated scores. 

In spite of their success in handsewing and garment operat- 
ing, however, it would seem that the special ability of the 
group with high pattern scores was not being exploited to the 
utmost. Because of the ability they revealed in copying de- 
signs, drawing the necessary details in representing a man, or 
doing the various paper and pencil tasks of manipulating 
spatial relationships required by the Pintner test, they might 
excel, for instance, in sample mounting where accuracy and 
fine discriminations are expected, or in making or copying 
simple designs for lamp shades, screens, pillows or table covers, 
and either painting or embroidering these designs. They 
might also succeed in loom weaving where ability to copy a 
design makes for superiority. It seems indicated that an 
attempt should be made to probe into and investigate the 
essential abilities of these individuals in order to provide a 
training that would develop their potentialities as fully as 
possible. 
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ALL OR NONE VERSUS GRADED RESPONSE 
QUESTIONNAIRES 


EDWIN E. GHISELLI 
University of Maryland 


N the construction of questions designed to measure opinion 
on a controversial issue, one is often confronted with the 
problem of deciding upon the degree of fineness of response 

to be allowed. Toa question such as ‘‘Do you think that large 
corporations are helping conditions in this country?’’ the 
respondent is often compelled to answer yes, no, or don’t know. 
With studies such as political surveys where the attempt is 
being made to predict in which of two ways the respondent is 
going to act at the polls, finer expression of opinion is per- 
haps not important. However, if the issue is one concerned 
with the measurement of opinion on some matter on which the 
respondent is not expected to act in an all or none fashion, it 
may be important to know the degree of favorableness or un- 
favorableness characteristic of the population. 

When this latter situation holds and the respondent is only 
permitted to make an absolute choice between mutually ex- 
clusive alternatives, it is implicitly assumed that the propor- 
tion of people responding in a given way is a direct measure of 
the ‘‘average’’ opinion of the whole population. Thus if 80 
per cent of a sample of Republicans and 80 per cent of a 
sample of Democrats answered no to the sample question given 
above, it would be concluded that there was a strong opinion 
against large corporations, an opinion supposedly as strong 
among Republicans as among Democrats. It is conceivable, 
however, that the Republicans who responded in the unfav- 
orable fashion were only so inclined to slight degree, while 
the Democrats who responded in the same way were rabidly 
opposed to large corporations. Thus, a simple tabulation of 
the percentage of unfavorable responses would be a poor index 
of the attitudes of these two groups on the point in question. 
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Similarly, if 65 per cent of a sample of the upper socio-eco- 
nomic group, and 95 per cent of the lower socio-economic 
group, answered no, it would ordinarily be concluded that 
there was a fairly wide difference in attitude toward large cor- 
porations on the part of these two socio-economic groups. 
However, this fairly large difference in percentage might re- 
flect only a small actual difference in attitude. The 65 per 
cent of the upper socio-economic group who responded in a 
negative fashion might be very unfavorable and the remain- 
ing 35 per cent only slightly favorable; while in the lower 
socio-economic group the 95 per cent who responded in a 
negative fashion might only be slightly unfavorable, and the 
remaining 5 per cent very favorable. Thus, if the respondents 
were allowed to make a finer expression of opinion the dif- 
ference might not be as great as a 2-step response makes it 
appear. 

A further problem connected with the decision of whether 
to use a 2-step or a multi-step response is the willingness of 
the respondents to make a judgment. While there is some 
experimental literature concerned with the relationship be- 
tween the number of steps in a rating scale and the reliability 
and validity of measurement, there is little or no exact infor- 
mation on the relationship between the number of alternative 
ways in which subjects may respond and their willingness to 
respond. 

In connection with a study of the belief in the sincerity of 
advertising the writer collected data pertinent to these two 
problems. Two hundred undergraduate students were given a 


, questionnaire consisting of a list of 41 different common 
brands representing 12 commodity types. These brands were 


presented in random order, regardless of commodity type. 
The respondents were asked to indicate after each brand 
whether or not they thought the advertising of that brand was 
sincere. One group of 102 respondents was asked to check 
yes if it were felt that the advertising was sincere, and no if 
not. For another group of 98 respondents a 4-step rating 
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seale, very sincere, fairly sincere, fairly insincere, and very 
insincere, was provided after each brand. 

In analyzing the results from the 2-step response question- 
naire, for each brand the don’t know responses were discarded 
and the percentage of favorable responses was computed from 
the remaining answers. In scoring the 4-step response ques- 
tionnaire a value of zero was given to a response of very in- 
sincere, 1 to fairly insincere, 2 to fairly sincere, and 3 to very 
sincere, assuming in the customary fashion that these steps 
are equal. On this basis a mean rating for each brand was 
computed for those subjects who responded to it. Higher 
average ratings, then, indicate stronger reported belief in sin- 
eerity, and a rating of 1.5 corresponding to 50 per cent on 
the 2-step response questionnaire as an indifference point. In 
addition, with the 4-step response questionnaire the two fav- 
orable categories were lumped together, and the per cent of 
favorable responses was also computed. 


RESULTS 


Effects of Limitation of Ways of Answering on the Kind 
of Responses 


Two comparisons are to be made in order to observe any 
effects limitation of the ways of answering may have on the 
kind of responses. First of all, although a respondent may 
not express his opinion in an all or none fashion, but rather 
qualify his answer, the interrogator may record the answer as 
if it had been an all or none response thus disregarding the 
degree of expressed opinion and noting only its direction. The 
effects of such a procedure may be studied by comparing the 
average rating of each brand on the 4-step response question- 
naire with the per cent of favorable responses on that same 
questionnaire computed from lumping together the two fav- 
orable categories. In the case just described the respondent 
is not actually limited in the number of ways in which he may 
answer, the limitation, rather, is in the number of ways in 
which the interrogator is permitted to record the answers. 
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The respondent himself, however, may be restricted to answer- 
ing in certain specified ways. Any effects this procedure may 
have on the kind of responses given may be studied by com- 
paring the per cent of favorable responses for each brand on 
the 2-step response questionnaire with either the average rat- 
ings or the per cent of favorable responses on the 4-step 
response questionnaire. 

The first comparison, then, is with the 4-step response ques- 
tionnaire between the average ratings and the per cent of fav- 
orable responses computed from lumping together the two fav- 
orable categories. In Fig. 1 these two sets of values are 
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Fie. 1. The Relationship between the Percentages of Favorable Re- 
sponses and the Average Ratings on the 4-step Response Questionnaire. 
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plotted against each other. The dotted curve represents the 
nature of this relationship. Had differences on these two 
scales been directly comparable this curve should have been a 
straight line as indicated by the solid line in the figure. It 
will be observed in Figure 1 that the actually obtained rela- 
tionship and the theoretical relationship do not coincide. The 
acutal relationship is curvilinear rather than rectilinear. Thus 
differences of equal amount on the percentage scale do not 
necessarily indicate equal differences on the 4-step response 
scale. For example, the difference between the ratings cor- 
responding to the 70 and 80 per cent points is 0.2, while the 
difference in ratings corresponding to the 90 and 100 per cent 
points is 0.7. It will be noted, however, that the indifference 
points on the two scales, i.e., 50 per cent and a rating of 1.5, 
correspond. Had the interrogator, then, merely noted whether 
the responses were favorable or unfavorable, and not the ex- 
tent to which they were favorable, the difference in reported 
belief in sincerity of advertising for two brands would be 
greater or less than the difference between them had some 
allowance been made for the qualification of opinion. 

Comparison of the responses to each brand on the 2-step and 
4-step response questionnaires will show any effects of limiting 
the respondent in the number of ways in which he may answer. 
In Fig. 2 the mean ratings on the 4-step response questionnaire 
are plotted against the per cent of favorable responses on the 
2-step response questionnaire. Again in this figure, the dotted 
curve indicates the nature of the obtained relationship while 
the solid line is that to be expected if differences on the two 
scales were proportional. As in the previous case it is to be 
observed that these two measurements do not yield propor- 
tional differences. For example, the difference in ratings 
corresponding to the zero and 10 per cent points is 1.0, while 
the difference between the ratings corresponding to the 90 and 
100 per cent points is only 0.2. 

In addition it will be noted that the indifference point on 
the per cent scale, 50 per cent, does not correspond to the indif- 
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ference point on the 4-step response scale, 1.5, but rather cor- 
responds to an indicated favorable attitude, 2.0. The direc- 
tion of this difference is characteristic of all points on the scale. 
That is, an item of a given degree of favorableness as indi- 
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eated by the percentage of people responding to it favorably 
on the 2-step response questionnaire is responded to in a more 
favorable fashion on the 4-step response questionnaire. 

When the per cent of favorable responses for each brand on 
the two questionnaires was plotted against each other, as in 
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Fig. 3, a curve similar to that in Fig. 2 results. Again the 
two seales will be seen not to yield proportional differences, the 
points of indifference do not correspond, and on a given item 
there are fewer people responding in a favorable fashion when 
only a 2-step response is permitted than when a 4-step re- 
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Fie. 3. The Relationship between the Percentages of Favorable Re- 
sponses on the 2-step and 4-step Response Questionnaires. 


sponse is permitted. Thus, differences in reported belief in 
sincerity of advertising vary with the number of ways in which 
the respondent is permitted to answer. Furthermore, the 
respondents who were only permitted to indicate whether or not 
they thought the advertising was sincere expressed less con- 
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fidence in its being sincere than did those respondents who 
were permitted to give a qualified answer. 


Comparison of Willingness to Respond on the 2-step and 4-step 
Response Questionnaires 


For each brand the per cent of don’t know responses on the 
4-step response questionnaire was subtracted from the per cent 
of don’t know responses on the 2-step response questionnaire. 
In only one instance did the same number of persons respond 
to a given brand on both questionnaires, and in one case 25 
per cent fewer persons responded to a brand on the 2-step 
response questionnaire than did to that same brand on the 
4-step response questionnaire. To all brands but one, then, 
there were fewer responses to the items on the 2-step response 
questionnaire than on the 4-step response questionnaire, the 
median of these differences being 12 per cent. 

For each brand the critical ratio of the difference between 
the per cent of people responding on the 2-step response ques- 
tionnaire and the per cent responding on the 4-step response 
questionnaire was computed. Fifteen of these critical ratios 
were below 2.5, nine between 2.5 and 3.0, and seventeen were 
3.0 and above. In general, then, most of these differences are 
significant statistically. Thus when a 4-step response was per- 
mitted more people were willing to respond than when only a 
2-step response was permitted. 


CONCLUSION 


With the particular material and subjects used in the pres- 
_ ent investigation, it was shown that when the qualifications 
_ in strength respondents put on their expression of opinion 
on a controversial issue were disregarded, and simply a record 
taken of whether their opinion was favorable or unfavorable, 
the results were different from those obtained when the quali- 
fications were taken into account. When the subjects them- 
selves were limited to making an absolute choice between 
mutually exclusive alternatives rather than being permitted 
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to qualify the strength of their opinion, their expressed opin- 
ion was distinctly less favorable, and fewer of them were 
willing to respond. 

With questions concerning other topics, and with other sub- 
jects these same differences might not appear. The data here 
presented are enough, however, to point out that the number 
of ways in which respondents are permitted to answer ques- 
tions may be a factor influencing measurement of their opinion 
on a given topic, and their willingness to respond. Further- 
more, they are sufficient to cast grave doubt on the assumption 
implicit in the use of the 2-step response (as yes-no, favorable- 
unfavorable, ete.) questionnaire method as a measure of 
**average’’ opinion. 














LOGIC—MACHINE SCORED? 


ERLAND NELSON 
Newberry College, South Carolina 


HEN old traditional logic with its rules, its circles, 
undistributed middles, illicit majors, and incomplete 
disjunctions goes ‘‘streamline,’’ we should be getting 

somewhat nearer the scientific age in education. At any rate, 
the material in a logic course is not only susceptible to objective 
test construction but also to electrical scoring. 

As a tentative experiment, the writer designed a test as a 
final examination for a first quarter’s work in college logic 
which could be scored by the I.B.M. electrical scoring machine. 
Although it is possible to use several types of objective tests 
on these machines, the multiple-choice type with five alter- 
natives for each item was chosen as most suitable in this case. 
This test of 150 items, each of which provides five alternative 
responses—a total of 750 alternatives—wrs then given the 
writer’s logic class of 63 sophomores as a final examination 
at the end of the quarter’s course. 

A few items chosen at random from the test will indicate 
the form into which logic can be cast for electrical scoring: 


1. Logie may best be defined as the: 1 art of correct thinking, 2 sci- 
ence of correct thinking, 3 science of the mind, 4 art of criticism, 
5 process of association. 
10. ‘‘All men are mortal. (Poor old Socrates again! ) 
Socrates was a man. 
.’. Socrates was mortal.’’ 
This reasoning may best be designated as the type of logic known 
as: 1 inductive, 2 transcendental, 3 symbolic, 4 universal, 5 deductive. 
62. To be great is to be misunderstood. 
Our governor is misunderstood. 
.’. Our governor is great. 
This argument may best be designated as: 1 valid, 2 ificit major, 
3 illicit minor, 4 undistributed middle, 5 denying the antgcedent. 
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63. If the Wage-Hour bill passes, unemployment will increase. 
Unemployment has increased. 
.. The Wage-Hour bill must have passed. 
This argument may best be designated as: 1 valid, 2 illicit major, 
3 affirming the consequent, 4 denying the antecedent, 5 false con- 
version. 


108, Abolition of college language requirements would mean progress for 
we know there can be no progress without change. 

This :casoning is a case of: 1 valid argument, 2 undistributed 
middle, 3 illicit major, 4 simple accident, 5 double middle. 

114, Miss Phail, the poorest student on the Campus, purchased a new 
Parker pen and since then has received ‘‘straight A’s.’’ On this 
evidence alone, Miss Phail concludes that ‘‘ Parker pens pay.’’ 

This reasoning is: 1 valid argument, 2 a ‘‘ post hoc,’’ 3 a case of 
begging the question, 4 a complex question, 5 a fallacy of division. 

With 750 responses available, the test is not only quite com- 
prehensive for a one-quarter college course, but it also gives 
latitude for the inclusion of a considerable number of ordinary 
problems of reasoning. Yet the time for administering this 
examination is only the regular two-hour examination period. 
In fact, 90 minutes will suffice for two-thirds of the students. 
The time required by the writer for scoring, rechecking, cal- 
culating class means and deviations, and recording grades for 
these 63 students is slightly more than two hours. 

Although this was a ‘‘first trial’’ of this test with the ‘‘in- 
evitable bad items’’ still included, the writer found a reliabil- 
ity of .92 by the odd-even technique (Spearman-Brown 
formula). Each of the 150 statements will, of course, be sub- 
jected to a careful item-analysis before the test is used again— 
the interesting point here is that logic can be and is being 
machine scored. 








NOTE ON THE IQ OBTAINED FROM THE 
OTIS GROUP INTELLIGENCE SCALE, 
ADVANCED EXAMINATION 


EDWARD E. CURETON 
Alabama Polytechnic Institute 


N spite of the introduction of new tests in recent years, the Otis Group 

Intelligence Secale, Advanced Examination, remains one of the most 
reliable and valid of all group intelligence tests. The author states, how- 
ever, that for pupils of a given chronological age, the variability of mental 
ages obtained from this test is greater than that obtained from the Binet 
test. It also differs from the Binet test in that the variability of its men- 
tal ages is approximately constant from year to year, instead of increasing ~ 
regularly with increase of mean chronological age. For these reasons Otis 
has devised a special technique for computing the Index of Brightness or 
IB. (This name is discarded in favor of IQ in later editions of the man- 
ual, but is used here to denote the measure computed according to his 
directions.) The IB is based on the difference between MA and CA in- 
stead of on their ratio, and is recommended by Otis as the best estimate 
of Binet IQ that can be obtained from this test.* This technique has 
been criticized from time to time, and the present note offers an empirical 
evaluation of the criticisms. 

The original Stanford-Binet Intelligence Scale, the New Stanford 
Achievement Test, Advanced Examination, Form V, and the Otis Group 
Intelligence Seale, Advanced Examination, Form A, had been given to 83 
seventh and eighth grade students in connection with another study.** 
The Binet IQ’s and the New Stanford EQ’s were computed in the usual 
manner. The Otis IB’s were computed first by the method recommended 
by the author, and then the IQ’s were computed by the usual method. 
From the distributions of mental and educational ages made in connection 
with the previous study it appeared that Otis’ method of computing the 
IB reduced the variability, and the usual method of computing the IQ 
when used with this test increased it, in comparison with the distributions 
of Binet IQ and New Stanford EQ. It then occurred to the writer that 
a useful empirical compromise might be to compute both measures and 
average. This was done for the 83 cases, and the resulting measure was 

* Otis Group Intelligence Seale, Manual of Directions for Advanced 
Examination, World Book Co., Yonkers-on-Hudson, N. Y. 

** Cureton, Edward E. ‘‘The Accomplishment Quotient Technic.’’ 
J. Exp. Educ., V, 315-326, 1937. 
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called the ‘‘IQ’’ to distinguish it from the IB and IQ. Comparisons 
between these various measures are shown in Tables I and II. 


TABLE I 
Means and Standard Deviations (N = 83) 

















M sD 
Otis IB 110.9 11.9 
Otis IQ 118.6 20.0 
Aer 114.8 15.7 
Binet IQ 108.3 17.2 
Stanford EQ... 109.4 16.7 





TABLE II 


Correlations, Average Differences, and Root-Mean-Square 
Differences (N = 83) 








r Av. diff. r-m-s diff. 
Otis IB-Binet IQ ................... 882 2.6 9.1 
Otis IQ-Binet IQ ................... .839 10.3 15.0 
Otis ‘‘IQ’’-Binet IQ ......... .865 6.5 10.8 
Otis IB-Stanford EQ .......... 940 1.5 7.0 
Otis IQ-Stanford EQ .......... .899 9.2 12.8 
Otis ‘‘IQ’’-Stanford EQ ..... .927 5.4 8.3 





The root-mean-square difference is obtained by squaring the actual dif- 
ference between each pair of quotients, averaging the squares, and extract- 
ing the square root. It differs from the standard deviation of the differ- 
ences in that it is not corrected for the fact that the average difference is 
some value other than zero. It is probably the most revealing of the 
comparison-measures used here, since it indicates directly (in the least 
squares sense) how bad the Otis IB, IQ, or ‘‘IQ’’ is as an estimate of 
Binet IQ or Stanford EQ. 

The standard deviation of the Otis ‘‘IQ’’ is closer to those of the Binet 
IQ and the Stanford EQ than is the standard deviation of either the Otis 
IB or the Otis IQ, as had been anticipated from previous examination of 
the corresponding mental and educational ages (see Table I). In all the 
other comparisons, the Otis IB is best, the Otis IQ worst, and the Otis 
**TQ’’ intermediate. Pending more adequate information or some still 
better approximation, therefore, the IB (or IQ as it is now called in the 
author’s manual) should continue to be employed as recommended by Otis, 
as the best estimate of Binet IQ (and also of Stanford EQ) to be obtained 
from this test. 





NEWS AND NOTES 


This JOURNAL wishes to announce that copies of the February issue are 
to be had at the regular price of $2.00. This issue was devoted entirely 
to Radio Research and Applied Psychology, with Dr. Paul F. Lazarsfeld, 
Director of the Princeton University Radio Research Project, as Guest 
Editor. Orders for copies should be sent to the Editor, JouRNAL oF 
APPLIED PsycHoLoey, Ohio University, Athens, Ohio. 


According to an announcement by Will H. Hays, President of the 
Motion Picture Producers and Distributors of America, there has been 
prepared a feature-length picture giving a graphic story of the highlights 
of American history as the Motion Picture Industry’s exhibit at the New . 
York World’s Fair. The picture is being shown in the Federal Building 
at the Fair under the auspices of the United States Commission. Another 
pictorial history which emphasizes the development of the West has been 
prepared for the International Golden Gate Exposition in San Francisco. 
Fifteen main episodes of American history have been assembled by Dr. 
James T. Shotwell, Director of the Division of Economies and History of 
the Carnegie Endowment for International Peace. The fifteen episodes 
include: I. Beyond History (The earliest home of man); II. The Ameri- 
ean Saga in the Middle Ages; III. The Old World Finds the New; IV. 
Mankind’s Second Chance (The chance to create a new world); V. The 
Thirteen Colonies and the Old World’s Dreams of Empire; VI. The Strug- 
gle for Independence; VII. The Constitution; VIII. The Conquest of the 
Wilderness; IX. A Nation Divided; X. Welding the Nation; XI. Million- 
handed Industry; XII. The Awakening of the Social Conscience; XIII. 
From Isolation to World Power; XIV. World War and World Peace; XV. 
America Faces the Future. 


Dr. C. M. Louttit, Indiana University, Bloomington, Indiana, is attempt- 
ing to compile a directory of psychological associations in the United 
States. He would appreciate having the secretaries of such associations, 
especially those of a local nature, send information about their organiza- 
tions to him. The information desired includes name, purposes, officers, 
size of membership, and frequency and date of meetings. 


Stanford University School of Education announces a Conference on 
Educational Frontiers from July 7-9, immediately following the summer 
meeting of the National Education Association in San Francisco, Cali- 
fornia. The California Congress of Parents and Teachers is joining the 
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School of Education in this Conference. Among the general sessions 
speakers will be Dr. Lewis M. Terman, Stanford University, whose subject 
will be ‘‘ New Evidence on the Nature of the Human Organism.’’ Other 
outstanding speakers will be Dr. John W. Studebaker, U. 8. Commissioner 
of Education; Dr. Howard W. Odum, University of South Carolina; and 
Dr. Jesse H. Newlon, Teachers College, Columbia University. 


The California Test Bureau, 3636 Beverly Blvd., Los Angeles, Califor- 
nia, announced recently the publication of a new intelligence test, the 
California Short Form Test of Mental Maturity. This form retains the 
Language and Non-Language features of the longer edition and may be 
given in one class period. Three IQ’s are secured from the test—Lan- 
guage, Non-Language, and a Combined IQ, making it possible to diagnose 
learning difficulties which may be due to reading difficulty or the verbal 
factor. The Short Form is available in five levels: Preprimary Series 
(Kgn—Ent. 1), Primary Series (Grs. 1-3), Elementary Series (Grs. 4-8), 
Intermediate Series (Grs. 7-10), and Advanced Series (Grs. 9—Adult). 


At the annual meeting of the Midwestern Psychological Association held 
recently at the University of Nebraska, the following new officers for the 
year 1939-40 were announced: President, J. P. Guilford, University of 
Nebraska; Secretary-Treasurer, Robert H. Seashore, Northwestern Uni- 
versity; and Council member, N. R. F. Maier, University of Michigan. 
The fifteenth annual meeting will be held May 3 and 4, 1940, at the Uni- 
versity of Chicago. 


The Department of Psychology of Cornell University is offering its 
facilities for study and research to psychologists during the summer of 
1939. The psychological laboratories, seminaries, libraries, and the ani- 
mal field station will be open without fees to investigators attending the 
summer research station. No tuition nor any other fees will be charged 
those with the doctoral degree. Others wishing to attend will be subject 
to the usual summer school fees. Ithaca provides also many facilities for 
summer recreation—swimming, golf, tennis, boating, etc. The Research 
Station in Psychology will be open this summer from June 20 to Septem- 
ber 1; attendance may begin and end at any time, however, within those 
limits. Those persons desiring to attend should apply for admission as 
early as possible because of limited room and desk space. 


Test Service Bulletin for May, 1939, published by the Test Division of 
The Psychological Corporation, New York City, contains a list of new 
tests and books recently published which should be of interest to psycholo- 
gists and educators. First is mentioned the ‘‘ Wechsler-Bellevue Intelli- 
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gence Scale,’’ by David Wechsler. This is an individual examination of 
general intellectual level, designed for use with adolescent and adult sub- 
jects, particularly suited for the classification of delinquents, abnormal 
individuals and prison inmates. Its flexibility permits modification for 
use with illiterates or the handicapped. The scale contains tests both of 
the verbal and performance types. ‘‘ Tests of Mental Development,’’ by 
F. Kuhlmann, is a new individual examination in which mental develop- 
ment is measured in terms of median abilities of children of different 
ages. Results are expressed in terms of mental age and IQ, or the more 
refined IQ substitute, Percent of Average. Quantitative indices of speed, 
accuracy, and variability are readily obtained. 

‘* Occupational Orientation Inquiry,’’ by G. A. Wallar and 8S. L. Pressey, 
is so planned that the student begins on the front page by reviewing his 
vocational interests and experiences. On the two inside pages is a care- 
fully compiled list of 224 occupations, ranging from those requiring no 
special training to those necessitating post-collegiate technical training. 
The student is asked to indicate the degree of his knowledge, interest, 
ability, and opportunity for each occupation. Having done this he is 
asked on the back page to evaluate his vocational problem in the light of 
all the above considerations and to isolate for further study the oceupa- 
tions in which he has the greatest possibilities. For high school seniors 
and college students. 

Dr. Hugh M. Bell has now prepared a form of his Adjustment Inven- 
tory for use with persons of adult age not in school designed to measure 
home, social, health, occupational and emotional adjustment. 

Dr. Arthur E. Traxler has designed a high school reading test to measure 
both speed and comprehension in reading simple social study material and 
to measure grasp of content with more difficult social and natural science 
paragraphs aside from speed. For grades 10 to 12. 

The Detroit General Aptitude Examination, by Harry J. Baker, Paul 
H. Voelker, and Alex C. Crockett, consists of three tests: intelligence, 
mechanical and clerical. For grades 6 to 12. 

The Rorschach Record Blank prepared by Bruno Klopfer and Helen 
H. Davidson consists of four pages with space for recording and sum- 
marizing responses. The conventional symbols for frequent response 
categories are defined and brief instructions for recording are given. 

**Gates Reading Survey,’’ by Arthur I. Gates, is intended to supplement 
rather than to supplant the Gates Silent Reading Test. It provides for 
four phases of reading ability: vocabulary, level of comprehension, speed 
of reading, accuracy of comprehension. For Grades 3-10, two Forms I 
and IT, 

A Rating Form for Use of Interviewers and Oral Examiners is a modi- 
fication of one prepared by Dr. Walter V. Bingham for the examination 
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of applicants for public employment. Nine characteristics are rated as 
follows: voice and speech, appearance, alertness, ability to present ideas, 
judgment, emotional stability, self-confidence, friendliness and general 
fitness. 

‘* Test of Critical Thinking in the Social Studies,’’ by J. Wayne Wright- 
stone, is devised to measure three important aspects of thinking: obtaining 
facts, drawing conclusions, applying generalization. For grades 4-6. 
Forms A and B. 

Among the books listed in the Test Service Bulletin are the following: 
‘*A Dictionary of Terms in Measurement and Guidance,’’ by Earl Ben- 
nett South; ‘‘A Bibliography of Mental Tests and Rating Scales (1939 
Revision),’’ by Gertrude Hildreth; ‘‘ Mental Tests (Revised Edition),’’ 
Frank N. Freeman; ‘‘Collection and Presentation of Statistical Data in 
Psychology and Education,’’ Martin F. Fritz. 

Further information concerning these tests and books may be obtained 
by writing to the Editor of the Test Service Bulletin, Dr. George K. Ben- 
nett, Psychological Corporation, 522 Fifth Avenue, New York City. 


The American Association for Applied Psychology will hold its annual 
professional conference Friday through Sunday, December 1-3, 1939, in 
Washington, D.C. The Association had voted to meet with the Interna- 
tional Congress of Psychotechnology should it be convened in America 
but that has proved impracticable. The By-Laws of the Association pro- 
vide that ‘‘insofar as possible the Association shall coordinate its program 
with that of the A.P.A.’’ It did not seem the part of wisdom, however, 
to take the annual meeting of so young an Assogiation so far from its 
center of population as would be necessary if it met this year with A.P.A. 
The Association therefore voted for this year to meet in the East at a 
time not in conflict with the A.P.A. There is no implication that the two 
associations shall continue to meet separately. 


In view of the large number of original studies of abilities, aptitudes 
and the selection and training of nurses, this JOURNAL is pleased to pub- 
lish the following note from The Answer Sheet, published by the Test 
Seoring Department of the International Business Machines Corporation: 
‘*A new division of the Psychological Corporation, The Nurse Testing 
Division, under the direction of Miss Edith Potts, has been started re- 
cently. This division offers to assist any school of nursing by adminis- 
tering, scoring and reporting the results of tests administered to applicants 
for admission to the schools. The Test Scoring Machine is used to score 
many of the examinations. 

** Applicants to the participating schools are required to take a number 
of tests which are administered at centrally located points throughout the 
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country under the supervision of representatives of the Nurse Testing 
Division. The Division scores the tests and makes a report to the schools, 
who use the findings in connection with other available information in 
selecting their students. The report to the schools includes recommenda- 
tions as to the probability of the applicant’s success. 

‘*The tests administered in this program measure scholastic aptitude, 
mechanical aptitude, speed and accuracy of reading and personality traits. 
The tests have proved particularly useful to the schools in (1) making a 
better selection among applicants, (2) giving a knowledge of weaknesses 
in the background and preparation of the applicants, (3) discovering 
exceptional students early in the course so that they may be given ample 
opportunity for full development. At the present time about 150 schools 
are using the service and about 6000 applicants are tested each year.’’ 

The following have been added to the list of tests which are now avail- 
able in machine-scored form: Gray-Votaw General Achievement Test 
Forms M and N, for grades 4-8. Sub-tests in Elementary Science, Social 
Studies, Knowledge of Literature, Choice of Words, Reading Vocabulary 
and Comprehension, and Arithmetic. Total working time, 73 minutes. 
Published by The Steck Company, Austin, Texas. Auto and Highway 
Safety Test, Form A, by H. T. Manuel. Printed on two sides of a 


machine-scored answer sheet. Published by The Steck Company, Austin, 
Texas. 




















BOOK REVIEWS 


Rogers, Can. R. The Clinical Treatment cf the Problem Child. Hough- 
ton Mifflin Company, 1939. 


After the child’s problem has been clarified, the next natural step is to 
do something about it. Shall one ‘‘psychoanalyze’’ and ‘‘treat’’ simul- 
taneously? Shall ‘‘passive therapy’’ be used? Is ‘‘habit training’’ 
sufficient? While Rome burns, Nero fiddles. While the child waits for 
treatment, the doctors theorize. The author thinks that ‘‘if we are to 
study treatment methods we should consider the steps and the techniques 
which are actually used by those dealing with children.’’ ‘‘. . . what do 
workers do to change the behavior of problem children?’’ This is a sane 
and practical beginning for making treatment scientific. 

Part I presents ways of understanding the child, with emphasis upon 
a component-factor method, in which are considered heredity, mentality, 
family influences, economic and cultural factors, etc. 

In Part II the author comes to ‘‘grips’’ with his subject. In a force- 
ful manner he discusses the foster home and institutional placement as 
methods of treatment; but he admits that the criteria for the removal of 
a child from his home are too subjective. 

If a child is kept in the home (Part III), treatment often involves 
changing parental attitudes, alleviating marital friction, and perhaps 
arousing new motives. The effect of the school in ere behavior has 
not been experimentally determined. 

Part IV, dealing with the individual, is in the opinion of the reviewer 
the ’).st part of the book. The author sets forth the qualifications of a 
therapist, the techniques which are effective, and finishes with a discussion 
of integrating treatment ‘‘factors’’ in order to secure results. 

The material of the book is drawn from a wide experience and a sound 
point of view. It is well organized, and the style of writing is easy and 
pleasing. 

J. BR. Gentry, 
Ohio Uniwersity 


CHESTER C. BENNETT. An Inquiry Into the Genesis of Poor Reading. 
Teachers College Contributions to Education, No. 755, 1938. Bureau 
of Publications, Teachers College, Columbia University. 

This book is significant in its largely negative findings. To all who 
teach young children, to all concerned with the way children learn, not 
only reading, but all learning, to those administering remedial instruction 
to reading failures, to those advocating a pet theory of some kind or other, 
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this book deserves a place on their ‘‘must’’ list. It will disappoint its 
readers, yet its very disappointing character is its most valuable asset. 

The author conducted an investigation designed to unearth the genesis 
of poor reading or reading failure. It was hoped to offer means whereby 
the future inferior reader could be early identified and his inferiority 
prevented. No such result was achieved. In short, the inferior reader is 
too much like others to be easily and early recognized except by his one 
distinguishing characteristic, inferiority in reading. Those differences, 
other than reading, which were discovered could hardly be said to come 
before, or even after, reading retardation. The problem was to analyze 
the reading performance, background and general life adjustments of a 
group of retarded readers in the primary grades with particular reference 
to possible causal factors of inferior reading achievement. 

The children included in the study came from New York City public 
schools. They were in the second and third grades, and for purposes of 
the experiment divided into two matched groups of 50 each, a group of 
poor readers and a group of normal readers. In addition to standardized 
reading examinations, three questionnaires elicited information from the 
child himself during an interview, from the teacher, and from the parents. 

The results best supported by the data indicate that being the eldest 
child is advantageous for reading success, that speech defects are con- 
ducive to reading failure, that teachers report the poor readers as lacking 
persistence and attention, that poor readers tend to be less active physi- 
cally and more inclined to individual activities, that poor readers view 
school as unpleasant, and that they suffer other maladjustments such as 
erying, fear, headaches, and stuttering. Only the first condition, being 
the eldest, may be said to exist before the child’s reading instruction be- 
gins. The others may be results of the inferior reading achievement as 
well as possible causes of reading failure. 

The author subscribes to the view that reading failure is brought about 
by a multiple cause. Reading is an intricate and complicated activity. 
It requires fine adjustments involving the entire child. The causes of 
reading failure are to be sought in the entire child and becomes an indi- 
vidual problem. The author suggests that further research should con- 
cern itself with the personal and social adjustment of the inferior reader 
in an effort to determine what kind of child fails in reading. Also, 
research should go back into the earlier life of the child. 

Although one may question the reliability and validity of results depen- 
dent so heavily upon questionnaires, and based on relatively a small num- 
ber of cases, the absence of any one suggested hypothesis to account for 
all or even many reading failures is refreshing. Here one finds expressed 
a sane outlook that no one factor is a principal cause but rather a com- 
plex pattern of pre-existing conditions. 

Sypney RosLow, 
Psychological Corporation 
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Local Broadcasts to Schools. Edited by Irvin Stewart. The University 
of Chicago Press, Chicago, 1939. 

In this book Mr. Stewart presents reports of the experiences of six 
representative cities in presenting local broadcasts to schools. The editor 
states that the report is intended primarily for the use of school adminis- 
trators in cities in which broadcast stations are located in the preparation 
and use of programs presented specifically for the local schools. 

The six cities chosen represent populations of approximately 1,500,000; 
900,000 ; 320,000; 300,000; 255,000; and 35,000. The writers are, respec- 
tively: Detroit—Paul T. Rankin; Cleveland—H. M. Buckley; Rochester, 
New York—Paul C. Reed; Portland, Oregon—E. H. Whitney; Akron, 
Ohio—Josephine French; and Alameda, California—Erle A. Kenney. 
There are basic variations in funds, availability of teachers, availability 
of program material, and number of pupils in these school systems. Each 
represents a different type of school community, and presents the local 
broadcast program as built around its available aids and its organization. 

Each report covers the more important facts concerning the instigation, 
preparation, and continuance of its local broadcast plan. The following 
general topics are included: the time, length and subject of each broad- 
east series; the process of selection and choice of treatment of topics; 
the preparation of scripts; the selection of persons to prepare scripts, to 
broadcast, and to criticize; pupil preparation, activity, and follow-up; 
supplementary aids employed; the techniques of the presentation; adjust- 
ments necessary to the broadcasting schedule; relationships of groups 
involved ; cost of preparation, performance, equipment, and supplementary 
materials; statistics as to programs and their use; and, the results of 
broadcasts, including opinions of pupils, teachers, administrators, and 
station officials, as well as estimates of educational effectiveness, with an 
indication of the method of measuring that effectiveness. 

There is wide variation in the processes of broadcasting employed by 
each system. Each of the writers has given a fairly complete critique of 
his method, pointing out defects and assets as he discusses the various 
topics. A comparison of the systems points out the benefits of the at- 
tempts to offer broadcasts for classroom use, the benefits of various tech- 
niques employed, and the possible use of the local broadcasting system in 
various types of school systems, as well as the disadvantages and problems 
which are present. 

The reasons shown for the continuance of locally prepared broadcasts 
may be classified in three groups: (1) to supplement and extend the regu- 
lar instructional program of the school by the use of this beneficial means, 
(2) to interpret the work and purposes of the local schools to the public, 
and (3) to provide pupils with an opportunity to derive such educational 
experiences and aesthetic satisfactions as are believed to result from pre- 
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paring and broadcasting radio programs. The lack of objective measures 
of the results of broadcasts in the field of educational broadcasting makes 
it impossible to find statistical material of any consequence evaluating the 
acceptance and advantages of these local broadcasting systems. 

It is evident that there is definite interest in the field of educational 
broadcasting, and that this educational method is being tried in several 
school systems. This first accumulated report of systems of local broad- 
casting to schools is an instrument by which we can readily compare the 
results obtained in these systems with the theories and aspirations of those 
working in the educational field. It summarizes the progress to date and 
shows rather clearly the status of local broadcasting, at the same time 
making clear the problems which must be met and solved as further prog- 
ress is made. 


DoroTtHy REECE, 
Ohio Uniwersity 











NEW BOOKS AND PAMPHLETS 


Books and pamphlets for review should be sent to James P. Porter, 
Editor, JOURNAL OF APPLIED PsycHOLOGyY, Ohio University, Athens, Ohio. 


American Psychology Before William James. JAY WHARTON Fay. Rut- 
gers University, New Brunswick, N. J., 1939. $2.50. 240 pp. 

The Challenge of Progressive Education. Fourth Conference on Educa- 
tion and the Exceptional Child. The Woods School, Langhorne, Pa., 
May, 1938. 69 pp. 

The Child and His Family. CuHartorte Bituuer. Translated by Henry 
Beaumont. Harper & Bros., New York, 1939. $2.50. 187 pp. 
Conference on Examinations, III. Paut Mownror, Editor. Bureau of 
Publications, Teachers College, Columbia University, New York City, 

1939. 330 pp. 

Educational Psychology. James L. MurRsELL. W. W. Norton & Co., New 
York City, 1939. 324 pp. 

F speriments in General Psychology. NorMa V. SCHEIDEMANN. Univer- 
sity of Chicago Press, Chicago, Ill., 1239. $1.75. 200 pp. 

General Psychology. J.P. Guitrorp. D. Van Nostrand Co., New York, 
1939. $3.00:. 630 pp. 

Local Broadcasts to Schools. Irvin Stewart, Editor. University of 
Chicago Press, Chicago, Ill., 1939. $2.00. 239 pp. 

Mental Hygiene in Modern Education. Paut A. Witty AnD CHARLES E. 
Skinner, Editors. Farrar & Rinehart, Inc., New York, 1939. $2.75. 
529 pp. 

Papers by Students of Prof. Raymond Herbert Stetson. The Journal of 
General Psychology, 1939, 20, Provincetown, Mass. 254 pp. 

Radio in Education. Department of Public Instruction, Commonwealth 
of Pennsylvania, Harrisburg, Pa., 1939. 47 pp. 

The Rockefeller Foundation—Review for 1938. RaymMonp B. Fospicx, 
New York. 72 pp. 

Talks to Teachers on Psychology (new edition). WiLtiAM James. In- 
troduction by John Dewey and William H. Kilpatrick. Henry Holt 
& Co., New York, 1939. $1.00. 238 pp. 
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